prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

Failed to list directory: s3 (presto 0.194) #10535

Open chafidz0000000 opened 6 years ago

chafidz0000000 commented 6 years ago

Sorry for not jumping to current version Presto 0.194 Hive HCatalog 2.3.2

hive-site.xml:

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>secret</value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>secret</value>
</property>

created an external table to s3 directory with hive, work selecting data from said table in hive console, work accessing the same table from presto, failed with error: com.facebook.presto.spi.PrestoException: Failed to list directory: s3://my-bucket/dt=20180501

which is the only partition exist on said table tried to apply hive.s3.aws-access-key & hive.s3.aws-secret-key on hive catalog and set hive.s3.use-instance-credentials to false, returned error:

3 errors
com.google.inject.CreationException: Unable to create injector, see the following errors:

1) Configuration property 'hive.s3.aws-access-key' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:234)

2) Configuration property 'hive.s3.aws-secret-key' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:234)

3) Configuration property 'hive.s3.use-instance-credentials' was not used
  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:234)

I wonder if it's a bug or something wrong with my config again, sorry if this issue is littering your git

thx.

kokosing commented 6 years ago

Can you post the content of your config.properties and hive.properties files?

chafidz0000000 commented 6 years ago

config.properties :

coordinator=true
node-scheduler.include-coordinator=false
discovery.uri=http://master-node:8889
http-server.threads.max=500
discovery-server.enabled=true
sink.max-buffer-size=1GB
query.max-memory=30GB
query.max-memory-per-node=3350074491B
query.max-history=40
query.min-expire-age=30m
http-server.http.port=8889
http-server.log.path=/var/log/presto/http-request.log
http-server.log.max-size=67108864B
http-server.log.max-history=5
log.max-size=268435456B
log.max-history=5

hive.properties

hive.metastore-refresh-interval=1m
connector.name=hive-hadoop2
hive.metastore.uri=thrift://master-node:9083
hive.metastore-cache-ttl=20m
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
hive.non-managed-table-writes-enabled = true
hive.s3-file-system-type = EMRFS
hive.hdfs.impersonation.enabled = true
#hive.s3.use-instance-credentials = false
#hive.s3.aws-access-key = secret
#hive.s3.aws-secret-key = secret

It doesn't matter if the presto machines don't have instance IAM right, as long as I put the correct access-key?

findepi commented 6 years ago

@chafidz0000000 when using

hive.s3-file-system-type = EMRFS

certain configuration values are not taken into account. Did you try hive.s3-file-system-type = PRESTO ?

chafidz0000000 commented 6 years ago

I've changed hive.s3-file-system-type from EMRFS to PRESTO Start up was okay by uncommented hive.properties configurations above. For a few seconds running simple query seems ok, until it aborted with error:

com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: ---; S3 Extended Request ID: ---)
nezihyigitbasi commented 6 years ago

Are you using Presto against AWS S3 or some other S3 compatible service?

According to the error message this may be related to the signer type (related config is: hive.s3.signer-type). Presto passes that signer type to ClientConfiguration::withSignerOverride(), so please check the AWS SDK documentation to see which value to use.

ankon commented 6 years ago

Start up was okay by uncommented hive.properties configurations above. For a few seconds running simple query seems ok, until it aborted with error:

As I just spent quite some time on that: When using S3 (or compatible) with version 4 signatures, make sure to use AWSS3V4SignerType and not AWS4SignerType as hive.s3.signer-type.

shuDaoNan9 commented 4 years ago

I have the same error with you @chafidz0000000 , but I cannot select data from tables on hdfs too. I'm sure my hive works well with S3 and hdfs.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.