trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.24k stars 2.95k forks source link

Support copy-on-write mode for Iceberg write #17272

Open jackye1995 opened 1 year ago

jackye1995 commented 1 year ago

Today Iceberg writes only support merge-on-read mode. Copy-on-write mode is a frequent ask for users that want better file layout without the need to run compactions frequently.

Technically this could be achieved pretty easily. The CoW implementation is already available for Delta:

https://github.com/trinodb/trino/blob/master/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java

findepi commented 1 year ago

Yes, I think we should support copy-on-write mode for UPDATE and DELETE for case when we update/delete most of the file. Creating big delta files isn't helpful for anyone. If we support this strategy, we should automatically choose between copy-on-write and merge-on-read within each IcebergMergeSink independently.

rdblue commented 1 year ago

@findepi, there's a table setting for choosing between copy-on-write and merge-on-read: write.(operation).mode where operation can be merge, update, or delete and mode can be copy-on-write or merge-on-read.

C-h-e-r-r-y commented 9 months ago

@rdblue

For now these settings can not be set during table creation :(

MotiPoyastro commented 3 months ago

This feature is highly requested in my org, Trino 443 + Iceberg on S3 parquets. frequent merge into queries becomes a problem on a large scale. currently dealing with post optimize/compaction for many small files created on a daily basis. but this is not a desired flow for us.