prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.02k stars 5.36k forks source link

RaptorX: s3 is not supported as the external filesystem #15951

Open asoriano-lightbox opened 3 years ago

asoriano-lightbox commented 3 years ago

We are trying to deploy latest (0.250) PrestoDB on EKS but with the RaptorX feature on. We built a docker image using the the below Dockerfile:

#https://prestodb.io/docs/current/installation/deployment.html#an-example-deployment-with-docker
FROM openjdk:8-jre

# Presto version will be passed in at build time
ARG PRESTO_VERSION

# Set the URL to download
ARG PRESTO_BIN=https://repo1.maven.org/maven2/com/facebook/presto/presto-server/${PRESTO_VERSION}/presto-server-${PRESTO_VERSION}.tar.gz

# Update the base image OS and install wget and python
RUN apt-get update
RUN apt-get install -y wget python less

# Download Presto and unpack it to /opt/presto
RUN wget --quiet ${PRESTO_BIN}
RUN mkdir -p /opt
RUN tar -xf presto-server-${PRESTO_VERSION}.tar.gz -C /opt
RUN rm presto-server-${PRESTO_VERSION}.tar.gz
RUN ln -s /opt/presto-server-${PRESTO_VERSION} /opt/presto

# Download the Presto CLI and put it in the image
RUN wget --quiet https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/${PRESTO_VERSION}/presto-cli-${PRESTO_VERSION}-executable.jar
RUN mv presto-cli-${PRESTO_VERSION}-executable.jar /usr/local/bin/presto
RUN chmod +x /usr/local/bin/presto

# Specify the entrypoint to start
ENTRYPOINT /opt/presto/bin/launcher run

This works great and we are able to create a cluster with 1 coordinator and multiple workers. We mount any config/properties files and catalogs at run time and we are able to access and query our hive metastore within our EKS cluster or within an EC2 instance (as a docker container) that has access to our s3/hive endpoints. However, when we try to enable the RaptorX feature the cluster starts as normal without any error but when we query the s3 parquet files tables that worked before without RaptorX we now get s3 is not supported as the external filesystem

2021-04-16T00:28:10.608Z        DEBUG   query-execution-2       com.facebook.presto.execution.QueryStateMachine Query 20210416_002809_00002_3pvve failed
com.facebook.presto.spi.PrestoException: s3 is not supported as the external filesystem.
        at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:128)
        at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)
        at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)
        at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)
        at com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException: s3 is not supported as the external filesystem.
        at alluxio.hadoop.LocalCacheFileSystem.initialize(LocalCacheFileSystem.java:86)
        at com.facebook.presto.cache.alluxio.AlluxioCachingFileSystem.initialize(AlluxioCachingFileSystem.java:66)
        at com.facebook.presto.cache.CacheFactory.createCachingFileSystem(CacheFactory.java:54)
        at com.facebook.presto.hive.cache.HiveCachingHdfsConfiguration.lambda$getConfiguration$0(HiveCachingHdfsConfiguration.java:70)
        at com.facebook.presto.hive.cache.HiveCachingHdfsConfiguration$CachingJobConf.createFileSystem(HiveCachingHdfsConfiguration.java:104)
        at org.apache.hadoop.fs.PrestoFileSystemCache.get(PrestoFileSystemCache.java:59)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at com.facebook.presto.hive.HdfsEnvironment.lambda$getFileSystem$0(HdfsEnvironment.java:71)
        at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
        at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:70)
        at com.facebook.presto.hive.HdfsEnvironment.getFileSystem(HdfsEnvironment.java:64)
        at com.facebook.presto.hive.StoragePartitionLoader.loadPartition(StoragePartitionLoader.java:163)
        at com.facebook.presto.hive.DelegatingPartitionLoader.loadPartition(DelegatingPartitionLoader.java:78)
        at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:192)
        at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:40)
        at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:121)
        ... 7 more

Im following this article on how to enable RaptorX: https://github.com/prestodb/presto/issues/13205

I verified that we have all the jars necessary ( I think)

image

and also verified that wherever we deployed the cluster has access to our data such as the s3 buckets and the hive metastore endpoint. I tried many hive s3 parameters configurations but nothing seems to help with this error. Last thing i tried was giving it the aws access key and secret on the hive properties that we know has access to our s3. Hive.properties

connector.name=hive-hadoop2
hive.metastore.uri = thrift://***
hive.s3-file-system-type = PRESTO
hive.node-selection-strategy=SOFT_AFFINITY
hive.partition-versioning-enabled=false
hive.metastore-cache-scope=PARTITION
hive.metastore-cache-ttl=2d
hive.metastore-refresh-interval=3d
hive.metastore-cache-maximum-size=10000000
hive.file-status-cache-expire-time=24h
hive.file-status-cache-size=100000000
hive.file-status-cache-tables=*
cache.enabled=true
cache.base-directory=file:///mnt/flash/data
cache.type=ALLUXIO
cache.alluxio.max-cache-size=1600GB
hive.partition-statistics-based-optimization-enabled=true
hive.orc.file-tail-cache-enabled=true
hive.orc.file-tail-cache-size=100MB
hive.orc.file-tail-cache-ttl-since-last-access=6h
hive.orc.stripe-metadata-cache-enabled=true
hive.orc.stripe-footer-cache-size=100MB
hive.orc.stripe-footer-cache-ttl-since-last-access=6h
hive.orc.stripe-stream-cache-size=300MB
hive.orc.stripe-stream-cache-ttl-since-last-access=6h
hive.parquet.metadata-cache-enabled=true
hive.parquet.metadata-cache-size=100MB
hive.parquet.metadata-cache-ttl-since-last-access=6h
hive.s3.connect-timeout=2m
hive.s3.max-backoff-time=10m
hive.s3.max-error-retries=50
hive.s3.max-connections=500
hive.s3.max-client-retries=50
hive.s3.socket-timeout=2m
hive.s3.staging-directory=/mnt/tmp/
hive.s3.use-instance-credentials=false
hive.s3.aws-access-key=***
hive.s3.aws-secret-key=***
hive.non-managed-table-writes-enabled=true
hive.allow-drop-table=true

config.properties

coordinator=true
node-scheduler.include-coordinator=true
discovery.uri=http://0.0.0.0:8080
http-server.http.port=8080
http-server.log.path=/var/log/presto/http-request.log
http-server.threads.max=500
discovery-server.enabled=true
sink.max-buffer-size=2GB
query.max-memory=825754MB
query.max-memory-per-node=214061170033B
query.max-total-memory-per-node=256873404039B
query.max-history=40
query.min-expire-age=30m
query.client.timeout=30m
query.stage-count-warning-threshold=100
query.max-stage-count=150
fragment-result-cache.enabled=true
fragment-result-cache.max-cached-entries=1000000
fragment-result-cache.base-directory=file:///mnt/flash/fragment
fragment-result-cache.cache-ttl=24h

jvm.config

-verbose:class
-server
-Xmx428122340065
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:ReservedCodeCacheSize=150M
-Xbootclasspath/p:
-Djava.library.path=/usr/lib/hadoop/lib/native/:/usr/lib/hadoop-lzo/lib/native/:/usr/lib/
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintJNIGCStalls
-XX:+PrintReferenceGC
-XX:+PrintGCCause
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-Xloggc:/var/log/presto/garbage-collection.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=4
-XX:GCLogFileSize=4M

node.properties

node.environment=production

I also tried the same thing with using ahanio's image and same error https://hub.docker.com/r/ahanaio/prestodb-sandbox

Not sure if we are misconfigured or is this a bug but i don't know what to try next.

Any help will be appreciated, thank you.

highker commented 3 years ago

@apc999, this is something alluxio could help to fix?

1ambda commented 2 years ago

Hi, Any update?

Jay-ju commented 2 years ago

any update?

Jay-ju commented 2 years ago

@apc999 how can use this feature ?

yingsu00 commented 2 years ago

cc @imjalpreet

imjalpreet commented 2 years ago

Sorry, missed this thread. I will have a look into it and submit a PR to fix the issue.