Open okayhooni opened 11 months ago
You are correct, a conversion failure will throw an error rather than route to the DLQ. Other sinks have an option to skip "bad" records, which is something we could add. Also, you can manually skip the record by setting the consumer group offset.
Thank you for quick answer!
You can leave this open if you want and we can consider options to handle this best.
Thanks..! I do reopen this issue..!
I hope this new feature can re-produce those messages to separate DLQ topic(as like this sink produces to control topic), to re-consume those messages in its own DLQ topic after re-configuring this sink connector or altering target table manually..!
hi @bryanck We are also trying to solve this conversion failure issue in the connector, so I was wondering is there way around to skip the records which are throwing the conversion failure error.
org.apache.kafka.connect.errors.DataException: An error occurred converting record, topic: datasets-test, partition, 0, offset: 10
-------------------connector-configurations---------------------
connector.class=io.tabular.iceberg.connect.IcebergSinkConnector iceberg.tables.cdc-field=_cdc_op errors.log.include.messages=false tasks.max=2 key.converter.region=us-east-1 iceberg.catalog.client.region=us-east-1 iceberg.tables.dynamic-enabled=true errors.deadletterqueue.context.headers.enable=true iceberg.control.commit.timeout-ms=60000 value.converter=org.apache.kafka.connect.json.JsonConverter errors.log.enable=true iceberg.control.group-id=cg-control-iceberg-kafka-new-config iceberg.tables.route-field=iceberg_table errors.retry.timeout=600000 iceberg.catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog iceberg.catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO topics=datasets-test errors.retry.delay.max.ms=30000 iceberg.catalog=AwsDataCatalog value.converter.region=us-east-1 errors.deadletterqueue.topic.name=datasets-error value.converter.schemas.enable=false errors.tolerance=all iceberg.catalog.warehouse=s3://data-os-sandbox iceberg.tables.default-id-columns=record_id iceberg.control.topic=control-iceberg-test
The only way to do it now is to manually set the partition offset ahead of the bad record(s). I'll look into adding this as a config option in a day or two.
I gave this some thought and I am reluctant to add an option to simply skip a bad record, as it could lead to unexpected data loss. We should have the ability to write the record to a DLQ or table. This was recently opened, which is related.
Opened this issue as well for the same functionality: https://github.com/tabular-io/iceberg-kafka-connect/issues/191
I tested DLQ options on this sink connector with deliberately wrong schema table.
But, it didn't work as I expected..
Kafka connect DLQ cannot handle PUT lifecycle stage as below..?
REF: https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues
If the error on sink phase is not handled by Kafka connect native DLQ feature, then how about adding similar DLQ option on this connector level (which can handle the issue records raising error on the PUT lifecycle)? something like..
iceberg.dlq.topic
..!