jhc-systems / debezium-connector-ibmi

Debezium Connector for IBM i (AS/400)
16 stars 12 forks source link

Missing data in Kafka even though it is present in the journal #75

Closed thomasfodor closed 2 months ago

thomasfodor commented 2 months ago

During connector testing, I stumbled upon an issue where messages or part of the message would sometimes be missing.

Issue 1: BEFORE missing

The issue would go something like this:

Row 1: Column A Original value ABC -> Kafka message from initial load produced correctly (before null, after set) Change value to ABD -> Kafka message produced correctly (before set, after set) Change value back to ABC -> Kafka message produced incorrectly (before null, after set) Change value back to ABD -> Kafka message produced incorrectly (before null, after set)

As you can see, the BEFORE value was null, even though there should have been one (I edited the same row back and forth as described and could produce this issue quite easily). The table has BOTH journal activated, and the journal table produces the entries correctly it seems:

image

On a table that uses only AFTER logs, events are produced normally, except in the case mentioned above - then the message would be entirely absent. This seems to be a known issue according to https://github.com/jhc-systems/debezium-connector-ibmi?tab=readme-ov-file#no-journal-entries-found-check-journalling-is-enabled-and-set-to-both, so the main focus is the issue with the BOTH journal table.

Connector config used:

{
    "connector.class": "io.debezium.connector.db2as400.As400RpcConnector",
    "schema": "XXXXXX",
    "message.key.columns": "XXXXXX.YYY:F1,F2,F3",
    "topic.creation.default.partitions": "12",
    "tasks.max": "1",
    "secure": "false",
    "hostname": "hostname.example.com",
    "password": "password1",
    "topic.prefix": "output-topic-name.cdc",
    "topic.creation.default.delete.retention.ms": "1167631000000",
    "schema.history.internal.kafka.topic": "output-topic-name.cdc.history",
    "poll.interval.ms": "2000",
    "topic.creation.default.replication.factor": "3",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "key.converter": "io.confluent.connect.avro.AvroConverter",
    "topic.creation.default.cleanup.policy": "compact,delete",
    "schema.history.internal.kafka.bootstrap.servers": "real-broker.example.com",
    "event.processing.failure.handling.mode": "warn",
    "topic.creation.default.retention.ms": "1167631000000",
    "value.converter.schema.registry.url": "https://real-registry.example.com/",
    "dbname": "THEDBNAME",
    "port": "1234",
    "name": "connector-name-here",
    "table.include.list": "XXXXXX.YYY",
    "user": "dbuser",
    "key.converter.schema.registry.url": "https://real-registry.example.com/",
    "field.name.adjustment.mode": "avro"
}

Some notes:

Issue 2: No change detection when using ExtractField transformers

When using ExtractField transformers, no messages beyond the initial load are produced (and certainly, the journal does produce them). No matter what changes, there is no output Kafka message. Kafka connector setup is identical to above with the following addition:

"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms": "flattenKey",
"transforms.flattenKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.flattenKey.field": "THE_SINGLE_FIELD_WE_NEED"

The same issue occurs if we use a value transformer, even when removing the key transformer:

"transforms": "flattenValue",
"transforms.flattenValue.type": "org.apache.kafka.connect.transforms.ExtractField$Value",
"transforms.flattenValue.field": "after"

Any help or suggestion how to avoid this would be appreciated. Thanks :)

msillence commented 2 months ago

Thanks for your bug report, I've now reproduced it and have a local fix I've also raised the issue in the official repo now it's been adopted by debezium: https://issues.redhat.com/browse/DBZ-7957 https://github.com/debezium/debezium-connector-ibmi/pull/16

msillence commented 2 months ago

FYI the missing before data was really useful information for tracking this down, I'd not noticed this issue as it's often the before data that's missing and we don't use that Thank you

msillence commented 2 months ago

fixed