holdenk / spark-upgrade

Magic to help Spark pipelines upgrade
Apache License 2.0
33 stars 15 forks source link

Limit check to changed partitions #147

Open holdenk opened 4 months ago

holdenk commented 4 months ago

We can do this by either logging the partitions that we're updating and limit comparison to those OR checking the iceberg metadata

This is important for supporting updates to large existing tables

holdenk commented 4 months ago

Or we can use incremental reads if we don't care about overwrite rows -- https://iceberg.apache.org/docs/latest/spark-queries/#incremental-read