Open jackye1995 opened 1 year ago
Yes, I think we should support copy-on-write mode for UPDATE and DELETE for case when we update/delete most of the file. Creating big delta files isn't helpful for anyone.
If we support this strategy, we should automatically choose between copy-on-write and merge-on-read within each IcebergMergeSink
independently.
@findepi, there's a table setting for choosing between copy-on-write and merge-on-read: write.(operation).mode
where operation
can be merge
, update
, or delete
and mode can be copy-on-write
or merge-on-read
.
@rdblue
For now these settings can not be set during table creation :(
This feature is highly requested in my org, Trino 443 + Iceberg on S3 parquets. frequent merge into queries becomes a problem on a large scale. currently dealing with post optimize/compaction for many small files created on a daily basis. but this is not a desired flow for us.
One concern called out by @dain (on Slack) that I will persist here:
CoW has it’s own problems because it breaks the change feed, so you cannot do downstream delta-style transforms (e.g., incremental materialized views)
I don't think this should be a blocker to adding support -- but something to keep in mind e.g. when considering doing a MoR vs CoW decision automatically vs based on explicit user input, since there is some difference in functionality supported.
~With https://github.com/trinodb/trino/pull/24031 it should now be possible to specify the default operation mode via the table property as @rdblue mentioned above.~
~Is it still desirable to provide an "integrated" solution for this in Trino? for most use-cases I think the current default of MoR on Trino is good.~
I had misunderstood, the table property only affects other engines. Trino still always writes using MoR.
Today Iceberg writes only support merge-on-read mode. Copy-on-write mode is a frequent ask for users that want better file layout without the need to run compactions frequently.
Technically this could be achieved pretty easily. The CoW implementation is already available for Delta:
https://github.com/trinodb/trino/blob/master/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java