trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.15k stars 2.93k forks source link

Iceberg Connector #1324

Closed lxynov closed 8 months ago

lxynov commented 5 years ago

TODOs for the Iceberg Connector

manishmalhotrawork commented 5 years ago

@linxingyuan1102 should it also be a TODO for:

"Iceberg table should also allow to give table location?" as its possible that, from same presto cluster I want to create tables pointing to different S3 account/clusters.

manishmalhotrawork commented 4 years ago

@lxynov https://github.com/prestosql/presto/issues/2660 added issue for ToDo "Needs correctness tests for partition pruning. (also validate the pushdown is happening by checking the query plans?)" Can you please link the issue with the todo.

lxynov commented 4 years ago

@manishmalhotrawork sure, done

Just a note, partition pruning in Iceberg is tricky because of partition spec evolution. We need more thoughts and discussion on this.

AbdullaevAPo commented 3 years ago

@lxynov Is it planned to add support of hdfs only iceberg tables (like in spark https://iceberg.apache.org/spark/ &spark.sql.catalog.hadoop_prod.type = hadoop ) ?

pPanda-beta commented 3 years ago

@lxynov any update on what @AbdullaevAPo asked? https://github.com/trinodb/trino/issues/1324#issuecomment-707585664

This feature is a blocker to perfect read-write isolation, having a hive-metastore as a common point of contact between spark and presto is not a scalable solution.

pan3793 commented 3 years ago

Is there any plan for supporting HadoopCatalog?

caneGuy commented 3 years ago

https://github.com/trinodb/trino/pull/6977/files @pan3793 i think this is related work

KarlManong commented 3 years ago

Will you support table configuration properties ?

RomantsovArtur commented 2 years ago

Hey.

I don't see in the list support of the UPDATE or CHANGE statement for ALTER TABLE. It would be very handy, since data evolves a lot.

I can see that the functionality exists in the IcerbergAPI: https://iceberg.apache.org/javadoc/master/org/apache/iceberg/UpdateSchema.html

I might be missing something.

bitsondatadev commented 2 years ago

https://iceberg.apache.org/javadoc/master/org/apache/iceberg/UpdateSchema.html

@RomantsovArtur, for schema evolution in Trino you can use ALTER TABLE <table-name> ADD|DROP COLUMN ...

See my section on Schema Evolution in this blog: https://blog.starburst.io/trino-on-ice-ii-in-place-table-evolution-and-cloud-compatibility-with-iceberg

If you're looking for updates for partition evolution, we are already tracking #7580 here.

Feel free to reach out to me on Trino Slack if you're looking for something specific.

RomantsovArtur commented 2 years ago

@bitsondatadev Thank you for your reply!

We are looking for some logic like: ALTER TABLE table_name CHANGE [COLUMN] col_name column_new_type

As you can see from the link I provided above - Iceberg API is available, but, unfortunately, Trino does not support this logic.

I read the doc you attached. Thank you for the beautiful blog post. The use case we are trying to achieve is the case when you have a table that is constantly written to and read by different clients, and we want to have an atomic type update rather than

Please note that I'm speaking about the case when we need to evolve many tables on a regular basics. Some are very huge, 100 b + records.

bitsondatadev commented 2 years ago

@bitsondatadev Thank you for your reply!

We are looking for some logic like: ALTER TABLE table_name CHANGE [COLUMN] col_name column_new_type ...

Made a new issue for this. First step is to add the syntax. Then this should be easy to hook up to Iceberg.

RomantsovArtur commented 2 years ago

Thank you for the quick reply! Looks great 🚀

rimolive commented 2 years ago

Posting here as it seems the central location to enable full support for iceberg as a Trino connector: Is there already support for rewrite_data_files procedure?

nicor88 commented 2 years ago

Row Level Delete where added to Iceberg, this means we that DELETE/UPSERT/MERGO INTO are unlock. I'm wondering when this feature will be included in trino connector (will it be cover by https://github.com/trinodb/trino/issues/10758)??

findepi commented 8 months ago

We don't use this issue for tracking Iceberg work anymore, so let me close it. There will always be some work items within such a broad area as Iceberg. Existing tickets can be found with

bitsondatadev commented 8 months ago

We don't use this issue for tracking Iceberg work anymore, so let me close it.

There will always be some work items within such a broad area as Iceberg.

Existing tickets can be found with

That being said, I really appreciate all the effort that was put into maintaining this initial roadmap.

That said, we should align in how we view larger efforts!

Thanks all!