prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

Iceberg support? #15620

Open fpj opened 3 years ago

fpj commented 3 years ago

I'm wondering if there is any plan to support Iceberg tables. I see that there is a Presto connector available, but it points to the Trino documentation.

https://iceberg.apache.org/presto/

dborkar commented 3 years ago

@fpj yes, we have plans to add iceberg. can you share your use case a bit more?

fpj commented 3 years ago

hi @dborkar , I'm looking forward to seeing some activity around it. I don't have a specific use case, I work on storage systems and I have been playing working on connectors for Presto. We will most likely start a conversation with the community soon about our work.

dborkar commented 3 years ago

That great to hear. once we have an early design we'll post it out for feedback as well @fpj Interested in hearing on the connectors you are building.

Tagging a few more people here who have done a lot of work on connectors / plugins: @zhenxiao @beinan @highker @ashishtadose

dborkar commented 3 years ago

BTW, @fpj jump on Presto slack http://slack.prestodb.io/ - this is different from TrinoDB.

fpj commented 3 years ago

Ok, will do. I'm wondering about one thing: should I close this issue? My question has been answered, but it might be a good idea perhaps to repurpose the issue or create another issue to track the Iceberg work? How do you do these things in this community?

ashishtadose commented 3 years ago

@fpj it's fine to keep it open so that community can share further updates here.

bitsondatadev commented 3 years ago

@fpj we're getting this switched to be called Trino to avoid any confusion about Presto having this connector yet. It's been merged but the site hasn't been rebuilt yet.

For those looking for Iceberg support now, it exists in what used to be called PrestoSQL, which has recently been renamed to Trino: https://trino.io/blog/2020/12/27/announcing-trino.html.

Join our slack if you have any further questions: https://trino.io/slack.html

fpj commented 3 years ago

@bitsondatadev Thanks for the input, that was my understanding from the beginning, but from the post, the community has split and diverged. I'm currently working off this repository here, that's why I'm interested in Iceberg support for PrestoDB, not Trino.

bitsondatadev commented 3 years ago

Understood @fpj we just want that to be clear if anyone is looking for iceberg support.

beinan commented 3 years ago

tagging @ChunxuTang He is working on the iceberg connector, I think

ChunxuTang commented 3 years ago

Yeah, I'm working with @zhenxiao on the iceberg connector. Will send a PR for a review.

fpj commented 3 years ago

Looking forward to the PR, @ChunxuTang.

dborkar commented 3 years ago

@fpj Thanks for raising the issue. Turned out multiple efforts were underway to integrate Iceberg 😄
Saved duplicate efforts!

fpj commented 3 years ago

Feels like a decent first community contribution...

dixingxing0 commented 3 years ago

Yeah, I'm working with @zhenxiao on the iceberg connector. Will send a PR for a review.

Hi chunxu, how is this work going, we are looking forward this fantastic feature!

ChunxuTang commented 3 years ago

@dixingxing0 The implementation goes well. Plan to send the PR very soon~

dixingxing0 commented 3 years ago

@dixingxing0 The implementation goes well. Plan to send the PR very soon~

Glad to see this PR!

huahua9427 commented 3 years ago

@ChunxuTang ,We need Presto iceberg connector very much now. I'm very glad to see that this work is in progress. I don't know when it will be released and where I can track the progress?

ChunxuTang commented 3 years ago

Hi folks, the PR has been merged to the Presto codebase. Feel free to close the issue~

timrobbins1 commented 3 years ago

Hi @ChunxuTang , appreciate your work in this space, but it does not seem like it's quite working yet. By default presto-iceberg is not included in the downloadable presto-server tgz file. When built from source and dropped in to the plugins directory of Presto 0.256 the following exception occurs when running a trivial INSERT statement (inserting 1 integer into an unpartitioned iceberg table with 1 integer column):

java.lang.NoSuchMethodError: org.apache.parquet.schema.PrimitiveType.getLogicalTypeAnnotation()Lorg/apache/parquet/schema/LogicalTypeAnnotation;
    at org.apache.iceberg.parquet.MessageTypeToType.primitive(MessageTypeToType.java:137)

The problem seems to be that hive-apache-3.0.0-3.jar is conflicting with parquet-column-1.11.0.jar. This getLogicalTypeAnnotation() method was introduced in parquet-mr 1.11.0 but until issue #14960 is merged, Presto is on parquet-mr 1.10.1.

I got it to work by renaming hive-apache-3.0.0-3.jar to zzhive-apache-3.0.0-3.jar . This lets parquet-mr 1.11.0 override 1.10.1 (unsafely).

timrobbins1 commented 3 years ago

I may have spoken too soon on this. With the above workarounds I'm running into "Not an Iceberg table" errors quite frequently when trying to SELECT the data back in. Overall it feels like there's a bit of a gap between what's possible and what's reasonably achieved out of the box by following the documentation.

Edit: "Not an Iceberg table" errors after INSERTs are gone in 0.257 snapshot, fixed by commit 14ad556876ead826069781bd6471855320b05815