Open vietcheems opened 8 months ago
@vietcheems thank you for reporting it.
when you are creating destination table, could it be that the table is created without "identifier field"? application does falls back to append mode when target table doesn't have "identifier field" defined.
if thats not the case, could it be that the debezium application still thinks the table is not partitioned? to test it
Could you try with adding identifier field after creating the table ALTER TABLE iceberg_test SET IDENTIFIER FIELDS id
https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table--set-identifier-fields
I'm using Trino and there isn't an option to set an identifier field as far as I'm concerned. However, using Trino, when I create the table without the partitioning spec (remove partitioning = array['name']), upsert still works as usual. So I don't know if "identifier field" is the issue.
Could you try with adding identifier field after creating the table
ALTER TABLE iceberg_test SET IDENTIFIER FIELDS id
https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table--set-identifier-fields
I switched to spark to try this and the error still persists.
@vietcheems Thank you for cheeking it, it is related to using two different partitions between the previous row(name=1
) and new row (name=deleted
). This means partitioning field should be immutable(should not change between old row and new row/upsert)
DOC:https://iceberg.apache.org/spec/#scan-planning
An equality delete file must be applied to a data file when all of the following are true:
The data file’s partition (both spec and partition values) is equal to the delete file’s partition or the delete file’s partition spec is unpartitioned
currently upsert without changing the name field(partition field) should work for partitioned tables.
We could try to change the deletion to be global delete. apply/save deletion with unpartitioned spec
In general, deletes are applied only to data files that are older and in the same partition, except for two special cases: Equality delete files stored with an unpartitioned spec are applied as global deletes. Otherwise, delete files do not apply to files in other partitions.
I'm trying to cdc data in upsert mode from Postgres. I notice when I partition the iceberg table by a column present in the source table, new records are appended instead of upserted.
Here is the source table's data initially:![image](https://github.com/memiiso/debezium-server-iceberg/assets/73997794/06f230c3-7864-47c7-b675-9e11d7193c12)
The data is replicated normally to Iceberg:![image](https://github.com/memiiso/debezium-server-iceberg/assets/73997794/2531b8c2-c064-4066-a8be-f27602b71c5d)
However, when I update the source table to:![image](https://github.com/memiiso/debezium-server-iceberg/assets/73997794/d1eddcc1-2db2-4461-9b44-f0ce9b8ae5a8)
Instead of updating the existing record, a new record is added to the Iceberg table:![image](https://github.com/memiiso/debezium-server-iceberg/assets/73997794/46eeb9e8-0c84-4fe9-8571-3c4486c7e135)
The table definitions are as follows:
This is my application.properties file:
The expected behaviour is there should only be 1 record in the destination table containing the updated data. When the destination table is not partitioned by the "name" column, upsert works fine.