prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

presto returning 0 results on hive table pointing to aws s3 #14377

Open datapa123 opened 4 years ago

datapa123 commented 4 years ago

Environment is AWS S3, aws emr 5.24.1, Presto : 0.219, GLUE as hive metadata store, hive and presto. 1) Created an non-partitioned external hive table that points data to s3 directory in parquet format.A pyspark utility loads the data in to s3 bucket runs on emr cluster when a select * from table is done in presto it returns zero results. Any suggestions ?

2) If the s3 bucket has a folder structure of year/month/date. can a hive table be created on top of that ? and can presto read the sub folders hierarchy ? ( I might be wrong I read in another article Presto can read files only at folder level given in location path of hive table and cannot scan child folders )

Please help.

Thank you very much

Ravion commented 4 years ago

Enable Hive bucketing to be true and retry.

Best, Ravi

On Sun, Apr 12, 2020, 12:31 AM datapa123 notifications@github.com wrote:

Environment is AWS S3, aws emr 5.24.1, Presto : 0.219, GLUE as hive metadata store, hive and presto.

1.

Created an non-partitioned external hive table that points data to s3 directory in parquet format.A pyspark utility loads the data in to s3 bucket runs on emr cluster when a select * from table is done in presto it returns zero results. Any suggestions ? 2.

If the s3 bucket has a folder structure of year/month/date. can a hive table be created on top of that ? and can presto read the sub folders hierarchy ? ( I might be wrong I read in another article Presto can read files only at folder level given in location path of hive table and cannot scan child folders )

Please help.

Thank you very much

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/prestodb/presto/issues/14377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3TLDGPJ73EJNPJTPELSR3RME73HANCNFSM4MGIQGUA .

mbasmanova commented 4 years ago

@datapa123 Did @Ravion's advice solve the problem? If so, let's close this issue.

khozzy commented 3 years ago

I'm facing the same issue as @datapa123. @Ravion could you please elaborate on how to enable the bucketing?