Closed hoaiff closed 2 weeks ago
@hoaiff Debezium might be generating the unique key combination. could you post it to debezium zulip channel?
Additionally if you enable DEBUG logging(example below) you could see event key and payload schema received by the consumer. this will be printed with the initial load when the initial table is created.
config.put("quarkus.log.category.\"io.debezium.server.iceberg.IcebergChangeConsumer\".level", "DEBUG");
example output: in the third row the key filed schema(PK) is printed, its sent by debezium the keys schema end like "testc.inventory.orders.Key"}
2024-03-18 11:07:13,867 WARN [io.deb.ser.ice.IcebergUtil] (pool-10-thread-1) Table not found: debeziumevents.debeziumcdc_testc_inventory_orders
2024-03-18 11:07:13,871 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-10-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int32","optional":false,"default":0,"field":"id"},{"type":"int32","optional":false,"name":"io.debezium.time.Date","version":1,"field":"order_date"},{"type":"int32","optional":false,"field":"purchaser"},{"type":"int32","optional":false,"field":"quantity"},{"type":"int32","optional":false,"field":"product_id"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"string","optional":true,"field":"__table"},{"type":"int64","optional":true,"field":"__source_ts_ms"},{"type":"string","optional":true,"field":"__db"},{"type":"int64","optional":true,"field":"__ts_ms"}],"optional":false,"name":"testc.inventory.orders.Value"}
2024-03-18 11:07:13,871 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-10-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int32","optional":false,"default":0,"field":"id"}],"optional":false,"name":"testc.inventory.orders.Key"}
2024-03-18 11:07:13,872 WARN [io.deb.ser.ice.IcebergUtil] (pool-10-thread-1) Creating table:'debeziumevents.debeziumcdc_testc_inventory_orders'
@hoaiff try following logging setting for testing. this will print the event data, there you can see what is the key sent by debezium. (Note for the config file backslashes are not needed.)
config.put("quarkus.log.category.\"io.debezium.server.iceberg\".level", "TRACE");
config.put("quarkus.log.category.\"io.debezium.server.iceberg\".min-level", "TRACE");
config.put("quarkus.log.level", "WARN");
logs
2024-03-18 11:27:25,446 TRACE [io.deb.ser.ice.IcebergChangeConsumer] (pool-10-thread-1) Processed event 'EmbeddedEngineChangeEvent [key={"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"default":0,"field":"id"}],"optional":false,"name":"testc.inventory.customers.Key"},"payload":{"id":1001}}, value={"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"default":0,"field":"id"},{"type":"string","optional":false,"field":"first_name"},{"type":"string","optional":false,"field":"last_name"},{"type":"string","optional":false,"field":"email"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"string","optional":true,"field":"__table"},{"type":"int64","optional":true,"field":"__source_ts_ms"},{"type":"string","optional":true,"field":"__db"},{"type":"int64","optional":true,"field":"__ts_ms"}],"optional":false,"name":"testc.inventory.customers.Value"},"payload":{"id":1001,"first_name":"Sally","last_name":"Thomas","email":"sally.thomas@acme.com","__deleted":"false","__op":"r","__table":"customers","__source_ts_ms":1710757638454,"__db":"postgres","__ts_ms":1710757638442}}, sourceRecord=SourceRecord{sourcePartition={server=testc}, sourceOffset={last_snapshot_record=false, lsn=34412792, txId=753, ts_usec=1710757638454155, snapshot=true}} ConnectRecord{topic='testc.inventory.customers', kafkaPartition=null, key=Struct{id=1001}, keySchema=Schema{testc.inventory.customers.Key:STRUCT}, value=Struct{id=1001,first_name=Sally,last_name=Thomas,email=sally.thomas@acme.com,__deleted=false,__op=r,__table=customers,__source_ts_ms=1710757638454,__db=postgres,__ts_ms=1710757638442}, valueSchema=Schema{testc.inventory.customers.Value:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}]'
2024-03-18 11:27:25,447 TRACE [io.deb.ser.ice.IcebergChangeConsumer] (pool-10-thread-1) Processed event 'EmbeddedEngineChangeEvent [key={"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"default":0,"field":"id"}],"optional":false,"name":"testc.inventory.customers.Key"},"payload":{"id":1002}}, value={"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"default":0,"field":"id"},{"type":"string","optional":false,"field":"first_name"},{"type":"string","optional":false,"field":"last_name"},{"type":"string","optional":false,"field":"email"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"string","optional":true,"field":"__table"},{"type":"int64","optional":true,"field":"__source_ts_ms"},{"type":"string","optional":true,"field":"__db"},{"type":"int64","optional":true,"field":"__ts_ms"}],"optional":false,"name":"testc.inventory.customers.Value"},"payload":{"id":1002,"first_name":"George","last_name":"Bailey","email":"gbailey@foobar.com","__deleted":"false","__op":"r","__table":"customers","__source_ts_ms":1710757638454,"__db":"postgres","__ts_ms":1710757638445}}, sourceRecord=SourceRecord{sourcePartition={server=testc}, sourceOffset={last_snapshot_record=false, lsn=34412792, txId=753, ts_usec=1710757638454155, snapshot=true}} ConnectRecord{topic='testc.inventory.customers', kafkaPartition=null, key=Struct{id=1002}, keySchema=Schema{testc.inventory.customers.Key:STRUCT}, value=Struct{id=1002,first_name=George,last_name=Bailey,email=gbailey@foobar.com,__deleted=false,__op=r,__table=customers,__source_ts_ms=1710757638454,__db=postgres,__ts_ms=1710757638445}, valueSchema=Schema{testc.inventory.customers.Value:STRUCT}, timestamp=null, headers=ConnectHeaders(headers=)}]'
one option is manually fixing it
Thanks @ismailsimsek I will try manually fixing it later. Besides, I added a log level as above. This is the detailed log
oracle_prod1 | 2024-03-19 01:48:39,576 WARN [io.deb.ser.ice.IcebergUtil] (pool-7-thread-1) Created namespace:'oracle_test2'
oracle_prod1 | 2024-03-19 01:48:40,424 WARN [org.apa.had.met.imp.MetricsConfig] (pool-7-thread-1) Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
oracle_prod1 | 2024-03-19 01:48:42,946 INFO [io.deb.ser.ice.his.IcebergSchemaHistory] (pool-7-thread-1) Starting IcebergSchemaHistory storage table:iceberg.debezium_database_history_storage_test
oracle_prod1 | 2024-03-19 01:48:56,539 WARN [io.deb.ser.ice.IcebergUtil] (pool-7-thread-1) Table not found: oracle_test2.ekyc_IDG_VOICE_COUNT_VIOLATE_CALL_CENTER_DAY
oracle_prod1 | 2024-03-19 01:48:56,553 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"VIOLATE_CODE"},{"type":"string","optional":true,"field":"COUNT_DATA"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"int64","optional":true,"field":"__source_ts_ms"}],"optional":false,"name":"ekyc.IDG_VOICE.COUNT_VIOLATE_CALL_CENTER_DAY.Value"}
oracle_prod1 | 2024-03-19 01:48:56,554 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"VIOLATE_CODE"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"}],"optional":false,"name":"ekyc.IDG_VOICE.COUNT_VIOLATE_CALL_CENTER_DAY.Key"}
oracle_prod1 | 2024-03-19 01:48:56,556 WARN [io.deb.ser.ice.IcebergUtil] (pool-7-thread-1) Creating table:'oracle_test2.ekyc_IDG_VOICE_COUNT_VIOLATE_CALL_CENTER_DAY'
oracle_prod1 | schema:table {
oracle_prod1 | 1: REPORT_DAY: required long (id)
oracle_prod1 | 2: VIOLATE_CODE: required string (id)
oracle_prod1 | 3: COUNT_DATA: optional string
oracle_prod1 | 4: HOTLINE: required string (id)
oracle_prod1 | 5: BRANCH: required string (id)
oracle_prod1 | 6: __deleted: optional string
oracle_prod1 | 7: __op: optional string
oracle_prod1 | 8: __source_ts_ms: optional timestamptz
oracle_prod1 | }
oracle_prod1 | rowIdentifier:[BRANCH, HOTLINE, VIOLATE_CODE, REPORT_DAY]
oracle_prod1 | 2024-03-19 01:48:56,832 DEBUG [io.deb.ser.ice.tab.IcebergTableOperator] (pool-7-thread-1) Batch got 5027 records with 1 different schema!!
oracle_prod1 | 2024-03-19 01:48:56,833 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"VIOLATE_CODE"},{"type":"string","optional":true,"field":"COUNT_DATA"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"int64","optional":true,"field":"__source_ts_ms"}],"optional":false,"name":"ekyc.IDG_VOICE.COUNT_VIOLATE_CALL_CENTER_DAY.Value"}
oracle_prod1 | 2024-03-19 01:48:56,833 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"VIOLATE_CODE"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"}],"optional":false,"name":"ekyc.IDG_VOICE.COUNT_VIOLATE_CALL_CENTER_DAY.Key"}
oracle_prod1 | 2024-03-19 01:48:56,885 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: required long, 2: VIOLATE_CODE: required string, 3: COUNT_DATA: optional string, 4: HOTLINE: required string, 5: BRANCH: required string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-19 01:48:56,886 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-19 01:48:56,887 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-19 01:48:56,888 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-19 01:48:56,888 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-19 01:48:56,888 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-19 01:48:56,888 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-19 01:48:56,889 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-19 01:48:56,889 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-19 01:48:57,013 ERROR [io.deb.rel.RelationalSnapshotChangeEventSource] (debezium-oracleconnector-ekyc-change-event-source-coordinator) Error during snapshot: java.util.concurrent.ExecutionException: java.lang.InterruptedException: Interrupted while snapshotting table PDBIDG_SYS.IDG_VOICE.COUNT_VIOLATE_CALL_CENTER_DAY
oracle_prod1 | at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
oracle_prod1 | at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
oracle_prod1 | at io.debezium.relational.RelationalSnapshotChangeEventSource.createDataEvents(RelationalSnapshotChangeEventSource.java:468)
oracle_prod1 | at io.debezium.relational.RelationalSnapshotChangeEventSource.doExecute(RelationalSnapshotChangeEventSource.java:165)
oracle_prod1 | at io.debezium.pipeline.source.AbstractSnapshotChangeEventSource.execute(AbstractSnapshotChangeEventSource.java:92)
oracle_prod1 | at io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:250)
oracle_prod1 | at io.debezium.pipeline.ChangeEventSourceCoordinator.doSnapshot(ChangeEventSourceCoordinator.java:234)
oracle_prod1 | at io.debezium.pipeline.ChangeEventSourceCoordinator.executeChangeEventSources(ChangeEventSourceCoordinator.java:186)
oracle_prod1 | at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:137)
oracle_prod1 | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
oracle_prod1 | at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
oracle_prod1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
oracle_prod1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
oracle_prod1 | at java.base/java.lang.Thread.run(Unknown Source)
oracle_prod1 | Caused by: java.lang.InterruptedException: Interrupted while snapshotting table PDBIDG_SYS.IDG_VOICE.COUNT_VIOLATE_CALL_CENTER_DAY
oracle_prod1 | at io.debezium.relational.RelationalSnapshotChangeEventSource.doCreateDataEventsForTable(RelationalSnapshotChangeEventSource.java:557)
oracle_prod1 | at io.debezium.relational.RelationalSnapshotChangeEventSource.lambda$createDataEventsForTableCallable$6(RelationalSnapshotChangeEventSource.java:520)
oracle_prod1 | at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
oracle_prod1 | ... 5 more
oracle_prod1 |
oracle_prod1 |
oracle_prod1 | 2024-03-19 01:48:57,020 WARN [io.deb.pip.sou.AbstractSnapshotChangeEventSource] (debezium-oracleconnector-ekyc-change-event-source-coordinator) Snapshot was not completed successfully, it will be re-executed upon connector restart
Additional, I see that field of keys schema is optional
{
"type":"struct",
"fields":[
{
"type":"int64",
"optional":true,
"name":"io.debezium.time.Timestamp",
"version":1,
"field":"REPORT_DAY"
},
{
"type":"string",
"optional":true,
"field":"VIOLATE_CODE"
},
{
"type":"string",
"optional":true,
"field":"HOTLINE"
},
{
"type":"string",
"optional":true,
"field":"BRANCH"
}
],
"optional":false,
"name":"ekyc.IDG_VOICE.COUNT_VIOLATE_CALL_CENTER_DAY.Key"
}
it seems like Debezium is sending this fields as key. not sure why, it might be oracle related feature.
Additional, I see that field of keys schema is optional
Correct but for iceberg, KEY fields cannot be null. that's why on iceberg side they are set as required
@ismailsimsek, I tried to create a new table with the same schema in dev env and I don't face same problem, RowIndentifier is right with the root table, I don't why it occured with my table in prod env
oracle_prod1 | 2024-03-20 04:08:08,167 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"VIOLATE_CODE"},{"type":"string","optional":true,"field":"COUNT_DATA"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"int64","optional":true,"field":"__source_ts_ms"}],"optional":false,"name":"ekyc2.DBZUSER.COUNT_VIOLATE_CALL_CENTER_DAY1.Value"}
oracle_prod1 | 2024-03-20 04:08:08,169 WARN [io.deb.ser.ice.IcebergUtil] (pool-7-thread-1) Creating table:'oracle_clob2.ekyc2_DBZUSER_COUNT_VIOLATE_CALL_CENTER_DAY1'
oracle_prod1 | schema:table {
oracle_prod1 | 1: REPORT_DAY: optional long
oracle_prod1 | 2: VIOLATE_CODE: optional string
oracle_prod1 | 3: COUNT_DATA: optional string
oracle_prod1 | 4: HOTLINE: optional string
oracle_prod1 | 5: BRANCH: optional string
oracle_prod1 | 6: __deleted: optional string
oracle_prod1 | 7: __op: optional string
oracle_prod1 | 8: __source_ts_ms: optional timestamptz
oracle_prod1 | }
oracle_prod1 | rowIdentifier:[]
oracle_prod1 | 2024-03-20 04:08:08,271 WARN [io.deb.con.ora.log.LogMinerStreamingChangeEventSource] (debezium-oracleconnector-ekyc2-change-event-source-coordinator) Database table 'PDBIDG_DEV.DBZUSER.COUNT_VIOLATE_CALL_CENTER_DAY1' not configured with supplemental logging "(ALL) COLUMNS"; only explicitly changed columns will be captured. Use: ALTER TABLE DBZUSER.COUNT_VIOLATE_CALL_CENTER_DAY1 ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS
oracle_prod1 | 2024-03-20 04:08:08,308 WARN [io.deb.con.ora.log.LogMinerStreamingChangeEventSource] (debezium-oracleconnector-ekyc2-change-event-source-coordinator) Redo logs may be sized too small using the default mining strategy, consider increasing redo log sizes to a minimum of 500MB.
oracle_prod1 | 2024-03-20 04:08:08,421 DEBUG [io.deb.ser.ice.tab.IcebergTableOperator] (pool-7-thread-1) Batch got 9 records with 1 different schema!!
oracle_prod1 | 2024-03-20 04:08:08,422 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"VIOLATE_CODE"},{"type":"string","optional":true,"field":"COUNT_DATA"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"int64","optional":true,"field":"__source_ts_ms"}],"optional":false,"name":"ekyc2.DBZUSER.COUNT_VIOLATE_CALL_CENTER_DAY1.Value"}
oracle_prod1 | 2024-03-20 04:08:08,433 INFO [io.deb.ser.ice.tab.IcebergTableOperator] (pool-7-thread-1) Table don't have Pk defined upsert is not possible falling back to append!
oracle_prod1 | 2024-03-20 04:08:08,437 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,437 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,438 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,439 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,439 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,439 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,439 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,439 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,439 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,440 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,440 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,440 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,440 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,441 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,441 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,441 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,441 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,441 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,442 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,443 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,444 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,445 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,446 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,447 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,447 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,447 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,447 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,447 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,447 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,448 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,448 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,448 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,448 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing nested field:struct<1: REPORT_DAY: optional long, 2: VIOLATE_CODE: optional string, 3: COUNT_DATA: optional string, 4: HOTLINE: optional string, 5: BRANCH: optional string, 6: __deleted: optional string, 7: __op: optional string, 8: __source_ts_ms: optional timestamptz>
oracle_prod1 | 2024-03-20 04:08:08,448 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:REPORT_DAY Type:long
oracle_prod1 | 2024-03-20 04:08:08,448 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:VIOLATE_CODE Type:string
oracle_prod1 | 2024-03-20 04:08:08,448 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:COUNT_DATA Type:string
oracle_prod1 | 2024-03-20 04:08:08,449 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:HOTLINE Type:string
oracle_prod1 | 2024-03-20 04:08:08,449 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:BRANCH Type:string
oracle_prod1 | 2024-03-20 04:08:08,449 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__deleted Type:string
oracle_prod1 | 2024-03-20 04:08:08,449 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__op Type:string
oracle_prod1 | 2024-03-20 04:08:08,449 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Processing Field:__source_ts_ms Type:timestamptz
oracle_prod1 | 2024-03-20 04:08:08,798 INFO [io.deb.ser.ice.tab.IcebergTableOperator] (pool-7-thread-1) Committed 9 events to table! s3a://datalake/datawarehouse/oracle_clob2.db/ekyc2_DBZUSER_COUNT_VIOLATE_CALL_CENTER_DAY1
oracle_prod1 | 2024-03-20 04:08:08,801 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Processed 9, QueueCurrentSize:0, QueueTotalCapacity:150000, SecondsBehindSource:0, SnapshotCompleted:true
oracle_prod1 | 2024-03-20 04:08:08,802 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Sleeping 20000 Milliseconds, QueueCurrentSize:0 < maxBatchSize:120000
oracle_prod1 | 2024-03-20 04:08:28,803 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Sleeping 20000 Milliseconds, QueueCurrentSize:0 < maxBatchSize:120000
oracle_prod1 | 2024-03-20 04:08:48,831 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Sleeping 20000 Milliseconds, QueueCurrentSize:0 < maxBatchSize:120000
oracle_prod1 | 2024-03-20 04:09:08,832 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Sleeping 20000 Milliseconds, QueueCurrentSize:0 < maxBatchSize:120000
oracle_prod1 | 2024-03-20 04:09:31,753 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Sleeping 20000 Milliseconds, QueueCurrentSize:0 < maxBatchSize:120000
oracle_prod1 | 2024-03-20 04:09:52,007 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Sleeping 20000 Milliseconds, QueueCurrentSize:0 < maxBatchSize:120000
oracle_prod1 | 2024-03-20 04:10:12,008 DEBUG [io.deb.ser.ice.bat.MaxBatchSizeWait] (pool-7-thread-1) Total wait 120000 Milliseconds, QueueCurrentSize:0 < maxBatchSize:120000
glad its working. only thing to keep in mind is: with upsert mode, for the tables without primary key, the consumer falls back to append mode.
Table don't have Pk defined upsert is not possible falling back to append!
@ismailsimsek, I tried manual fixing in prod env, but it not working
one option is manually fixing it stop the consumer, alter iceberg table drop identifier filed, set fields optional. start the consumer
table after drop identifier filed
Stack trace after running again connector
oracle_prod1 | 2024-03-21 08:56:36,244 DEBUG [io.deb.ser.ice.tab.IcebergTableOperator] (pool-7-thread-1) Batch got 6930 records with 1 different schema!!
oracle_prod1 | 2024-03-21 08:56:36,259 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"COUNT_DATA"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"},{"type":"string","optional":true,"field":"__deleted"},{"type":"string","optional":true,"field":"__op"},{"type":"int64","optional":true,"field":"__source_ts_ms"}],"optional":false,"name":"ekyc2.IDG_VOICE.COUNT_CALL_CENTER_DAY.Value"}
oracle_prod1 | 2024-03-21 08:56:36,260 DEBUG [io.deb.ser.ice.IcebergChangeEvent] (pool-7-thread-1) Converting iceberg schema to debezium:{"type":"struct","fields":[{"type":"int64","optional":true,"name":"io.debezium.time.Timestamp","version":1,"field":"REPORT_DAY"},{"type":"string","optional":true,"field":"HOTLINE"},{"type":"string","optional":true,"field":"BRANCH"}],"optional":false,"name":"ekyc2.IDG_VOICE.COUNT_CALL_CENTER_DAY.Key"}
oracle_prod1 | 2024-03-21 08:56:52,037 INFO [io.deb.ser.ice.off.IcebergOffsetBackingStore] (pool-7-thread-1) Stopped IcebergOffsetBackingStore table:oracle3.debezium_offset_storage_custom_table
oracle_prod1 | 2024-03-21 08:56:52,038 ERROR [io.deb.ser.ConnectorLifecycle] (pool-7-thread-1) Connector completed: success = 'false', message = 'Stopping connector after error in the application's handler method: Cannot add field REPORT_DAY as an identifier field: not a required field', error = 'java.lang.IllegalArgumentException: Cannot add field REPORT_DAY as an identifier field: not a required field': java.lang.IllegalArgumentException: Cannot add field REPORT_DAY as an identifier field: not a required field
oracle_prod1 | at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:218)
oracle_prod1 | at org.apache.iceberg.Schema.validateIdentifierField(Schema.java:126)
oracle_prod1 | at org.apache.iceberg.SchemaUpdate.lambda$applyChanges$1(SchemaUpdate.java:555)
oracle_prod1 | at java.base/java.lang.Iterable.forEach(Unknown Source)
oracle_prod1 | at org.apache.iceberg.SchemaUpdate.applyChanges(SchemaUpdate.java:554)
oracle_prod1 | at org.apache.iceberg.SchemaUpdate.apply(SchemaUpdate.java:440)
oracle_prod1 | at org.apache.iceberg.SchemaUpdate.apply(SchemaUpdate.java:48)
oracle_prod1 | at io.debezium.server.iceberg.tableoperator.IcebergTableOperator.applyFieldAddition(IcebergTableOperator.java:116)
oracle_prod1 | at io.debezium.server.iceberg.tableoperator.IcebergTableOperator.addToTable(IcebergTableOperator.java:155)
oracle_prod1 | at io.debezium.server.iceberg.IcebergChangeConsumer.handleBatch(IcebergChangeConsumer.java:167)
oracle_prod1 | at io.debezium.embedded.ConvertingEngineBuilder$ConvertingChangeConsumer.handleBatch(ConvertingEngineBuilder.java:108)
oracle_prod1 | at io.debezium.embedded.EmbeddedEngine.pollRecords(EmbeddedEngine.java:728)
oracle_prod1 | at io.debezium.embedded.EmbeddedEngine.run(EmbeddedEngine.java:475)
oracle_prod1 | at io.debezium.embedded.ConvertingEngineBuilder$1.run(ConvertingEngineBuilder.java:248)
oracle_prod1 | at io.debezium.server.DebeziumServer.lambda$start$1(DebeziumServer.java:170)
oracle_prod1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
oracle_prod1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
oracle_prod1 | at java.base/java.lang.Thread.run(Unknown Source)
oracle_prod1 |
oracle_prod1 |
oracle_prod1 exited with code 1
@hoaiff this is different table it seems, right?
according to the log this source table has primary keys. line 3 in the above log.
whats happening is: Since debezium still sending events with key schema, the 3th line above, the consumer is detecting schema difference and trying to apply the schema changes to destination-debezium table.
one option is to disable fiedl addition feature:
debezium.sink.iceberg.allow-field-addition=false
with this the schema changes will not be applied, by the consumer.
See: https://github.com/memiiso/debezium-server-iceberg/blob/master/docs/DOCS.md#schema-change-behaviour
Hi @ismailsimsek, I found this problem. Because my table in prod env using unique index on 4 fields: BRANCH, HOTLINE, VIOLATE_CODE, REPORT_DAY therefore debezium gen it as key in key schema
@hoaiff in that case the fix is:
it seems like the consumer is tying to do it automatically, but it cannot handle the fist step.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
Hi @ismailsimsek, I have run Oracle prod and faced an error when creating the following table. I checked the constraints of my table and found no constraint data, but in the log, it shows
rowIdentifier:[BRANCH, HOTLINE, VIOLATE_CODE, REPORT_DAY]
and stop connector