Open skandasa23 opened 3 years ago
That easy to imagine when files are just being added to a table -- you would just do a set operation on file names between snapshots How does spark handle situation when files are replaced (updated rows, small files combined into bigger, etc.)? Or when there are v2 deletion files?
Trino currently supports reading data belonging to a particular Iceberg snapshot. Incremental read support helps to read only the changed data between snapshots. Not sure of the Trino convention but something like this; select count(*) from iceberg.testdb."table@{S1,S2}" - outputs only the inserted rows between S1(exclusive) and S2(inclusive).
Spark support;