tabular-io / iceberg-kafka-connect

Apache License 2.0
192 stars 41 forks source link

Option to add any missing record fields to the table schema #95

Closed bryanck closed 11 months ago

bryanck commented 11 months ago

This PR introduces a new connector option, iceberg.tables.evolveSchemaEnabled. When this is set to true, sink tasks will detect any record fields that are missing from the table during message conversion. Missing columns are then added to the table before the row is written, the row is re-converted using the new schema, and finally the row is written.

All tasks apply schema changes independently, thus each task might attempt the same schema changes. Because of this, multiple attempts are made to update the schema, and the latest schema is checked to see if the changes were applied by another process. This ensures all tasks end up using the same field IDs when writing the data, and the catalog is the source of truth.

Automatic table creation will be added in a follow-up PR.