Open mchutani-CS opened 2 years ago
Hi @mchutani-CS As per the configuration, you are trying to CREATE TABLE via Delta Connector, which is not supported. You can run CREATE TABLE command via the hive connector, please use the hive connector instead here. https://prestodb.io/docs/current/connector/hive.html#configuration
Hi @agrawalreetika . Thanks for your response. My usecase is to test the read performance of PrestoDB using the DeltaConnector since my DeltaTable resides in a cloud storage (s3, gs). As per my understanding, if I use the Delta connector, the hive metastore is used to store metadata about the Delta Table. So with this CREATE TABLE command, I was trying to update hive metastore with the metadata about my Delta Table. Do you have a suggestion on how to go about it? Should I use a combination of Delta and hive connector?
The Trino fork of Presto has a Delta Lake connector that supports writes. You might want to give that a try.
Hi @mchutani-CS, You can use CREATE TABLE command via Hive Connector to register your delta tables in HMS and then make use of Presto delta connector to access the data. Ref - https://prestodb.io/docs/current/connector/deltalake.html Let me know if you have any questions.
Hi @agrawalreetika,
Thank you for your suggestion. I'm going to try it out.
I have a question. Are there any read performance benefits if I register by delta tables in HMS vs directly accessing it through $path
like this -> SELECT * FROM sales."$path$"."s3://db-sa-datasets/presto/sales_data" LIMIT 200;
?
Hi @mchutani-CS
Basically when you register table in HMS, presto-delta connector will get the location of delta table from HMS and then the flow for metadata and data access would be similar to when you access table using $path
So I believe until your HMS call for getting location is not a bottleneck there shouldn't be any performance difference.
Hi @agrawalreetika , I was able to query the delta table present in s3, but it is taking very long time. As I saw the logs of presto, some errors were raised before the query execution completes.
Caused by: com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: N2A11KNDX3BQEEMM; S3 Extended Request ID: l8bMzArIpHdUDBxerMYF1xANqEc7ikQruC57so9TbCD5VrHlfOacjjtzIMD4FhslN8dkjzxtvNE=), S3 Extended Request ID: l8bMzArIpHdUDBxerMYF1xANqEc7ikQruC57so9TbCD5VrHlfOacjjtzIMD4FhslN8dkjzxtvNE= (Path: s3a://
Hi @lohithalla, This issue is getting fixed in this PR - #18307, which is part of last release 0.277. Could you please try it out there.
@agrawalreetika I am using ahanio/presto latest image and still facing the issue, is that fix not avaiable in 0.277?
Yes, it should be in 0.277
No i guess....just checked now in 0.277 release notes as well...not mentioned about this change
I missed this. It would come in presto-0.278. I just rechecked the commits. Sorry for the confusion. Here -
@agrawalreetika I was trying to setup Presto with Hive Connector to query data from S3 bucket. I'm following below document https://prestodb.io/docs/current/connector/hive.html to develop the POC. I was able to create hive.web.page_views table and list the partitions of the page_views table, but while creating an external Hive table its failed: ERROR: External location must be a directory. would you please offer some suggestions? Thanks in advance
Query: CREATE TABLE hive.web.request_logs ( request_time timestamp, url varchar, ip varchar, user_agent varchar ) WITH ( format = 'TEXTFILE', external_location = 's3://my-bucket/data/logs/' )
ERROR: com.facebook.presto.spi.PrestoException: External location must be a directory at com.facebook.presto.hive.HiveMetadata.getExternalPath(HiveMetadata.java:1308) at com.facebook.presto.hive.HiveMetadata.prepareTable(HiveMetadata.java:1011) at com.facebook.presto.hive.HiveMetadata.createTable(HiveMetadata.java:958) at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.createTable(ClassLoaderSafeConnectorMetadata.java:356) at com.facebook.presto.metadata.MetadataManager.createTable(MetadataManager.java:656) at com.facebook.presto.execution.CreateTableTask.internalExecute(CreateTableTask.java:201) at com.facebook.presto.execution.CreateTableTask.execute(CreateTableTask.java:87) at com.facebook.presto.execution.CreateTableTask.execute(CreateTableTask.java:69) at com.facebook.presto.execution.DDLDefinitionExecution.executeTask(DDLDefinitionExecution.java:58) at com.facebook.presto.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:209) at com.facebook.presto.$gen.Presto_0_277_119fd9a____20221020_192344_1.run(Unknown Source) at com.facebook.presto.execution.SqlQueryManager.createQuery(SqlQueryManager.java:286) at com.facebook.presto.dispatcher.LocalDispatchQuery.lambda$startExecution$8(LocalDispatchQuery.java:197) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)
Hi @medams , Could you please let me know which version of presto are you using?
Versions I'm using Presto 0.277 Hadoop: 3.3.1 Hive 2.3.9 I'm able to create table when we have one file under one directory. if we have multiple files under one directory we are getting com.facebook.presto.spi.PrestoException: External location must be a directory.
@medams, I tried reproducing the issue on my end but I am not able to. I used Presto 0.277, I am able to create an external table fine on s3 data. I am assuming you already have the s3 location existing before creating an external table via Presto?
Hi devs, I'm pretty new to the ecosystem and was trying to setup Presto with hive to query a deltatable located in Amazon S3. I'm trying to run this setup on my MAC (intel). Any guidance would be highly appreciated.
PRESTO VERSION:
0.274
HIVE VERSION:3.1.2
QUERY:RESULT:
Here's my config:
HIVE:
conf/hive-site.xml
Presto:
etc/node.properties
etc/jvm.config
etc/config.properties
etc/catalog/delta.properties