trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.5k stars 3.02k forks source link

Support copy-on-write mode for Iceberg write #17272

Open jackye1995 opened 1 year ago

jackye1995 commented 1 year ago

Today Iceberg writes only support merge-on-read mode. Copy-on-write mode is a frequent ask for users that want better file layout without the need to run compactions frequently.

Technically this could be achieved pretty easily. The CoW implementation is already available for Delta:

https://github.com/trinodb/trino/blob/master/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java

findepi commented 1 year ago

Yes, I think we should support copy-on-write mode for UPDATE and DELETE for case when we update/delete most of the file. Creating big delta files isn't helpful for anyone. If we support this strategy, we should automatically choose between copy-on-write and merge-on-read within each IcebergMergeSink independently.

rdblue commented 1 year ago

@findepi, there's a table setting for choosing between copy-on-write and merge-on-read: write.(operation).mode where operation can be merge, update, or delete and mode can be copy-on-write or merge-on-read.

C-h-e-r-r-y commented 11 months ago

@rdblue

For now these settings can not be set during table creation :(

MotiPoyastro commented 5 months ago

This feature is highly requested in my org, Trino 443 + Iceberg on S3 parquets. frequent merge into queries becomes a problem on a large scale. currently dealing with post optimize/compaction for many small files created on a daily basis. but this is not a desired flow for us.

xkrogen commented 3 weeks ago

One concern called out by @dain (on Slack) that I will persist here:

CoW has it’s own problems because it breaks the change feed, so you cannot do downstream delta-style transforms (e.g., incremental materialized views)

I don't think this should be a blocker to adding support -- but something to keep in mind e.g. when considering doing a MoR vs CoW decision automatically vs based on explicit user input, since there is some difference in functionality supported.

hashhar commented 6 days ago

~With https://github.com/trinodb/trino/pull/24031 it should now be possible to specify the default operation mode via the table property as @rdblue mentioned above.~

~Is it still desirable to provide an "integrated" solution for this in Trino? for most use-cases I think the current default of MoR on Trino is good.~

I had misunderstood, the table property only affects other engines. Trino still always writes using MoR.