scylladb / scylla-cdc-source-connector

A Kafka source connector capturing Scylla CDC changes
Apache License 2.0
46 stars 18 forks source link

Preimage support #39

Closed Bouncheck closed 8 months ago

Bouncheck commented 1 year ago

Adds an option to enable preimage support. Connector with said support enabled will use PRE_IMAGE operation type RawChanges to fill 'before' field of debezium Envelopes of other types of changes.

If enabled ScyllaChangesConsumer will now remember last PRE_IMAGE type RawChange for each distinct TaskId. Upon encountering next supported non-preimage operation the consumer will use remembered preimage and clear the 'cache' for that TaskId.

Bouncheck commented 1 year ago

After more thorough testing it seems this works only with sequential synchronous (session.execute()) writes from a single thread. Scenario with multiple async writes called sequentially from single thread fails. Scenario with multiple sync writes from multiple threads fails, even when assigning specific primary and clustering keys to specific threads. (edit: Each test has following steps: App1 writes to Scylla, Connector reads CDC log, App2 validates data on Kafka topic. Each scenario was checking different type of App1)

Bouncheck commented 1 year ago

Pushed a new version. Previous one had wrong assumptions. Seems to be working now. Concurrent writes to the same PK are still (and always will be) a no go, due to the CDC limitations, but having multiple threads where each writes to different PKs and ensures that single thread subsequent writes do not intertwine should be fine.

This change still could use some heavy load testing with adding/removing nodes between some operations.

mykaul commented 1 year ago

@Bouncheck , @avelanarius - any progress?

ricardoborenstein commented 1 year ago

Any updates? Customer requested to prioritize

mykaul commented 12 months ago

@avelanarius - ping

gcarmin commented 11 months ago

Hi - we have yet another prospect that asking about this - do we have nay highlevel time line for it?

avelanarius commented 11 months ago

@Bouncheck was doing some testing of this PR last week regarding if it correctly handles crashes (at-least-once guarantee) and found some potential problems. We'll try to give you an estimate soon.

avelanarius commented 11 months ago

Some more clarification: the problem is not in this PR or in connector, but in the underlying CDC library: if a crash happens in connector, it will correctly resume, but if a crash happens inside CDC library (without propagating the crash to the connector) there's currently a problem with resumption.

mykaul commented 10 months ago

Some more clarification: the problem is not in this PR or in connector, but in the underlying CDC library: if a crash happens in connector, it will correctly resume, but if a crash happens inside CDC library (without propagating the crash to the connector) there's currently a problem with resumption.

So what's the next step here? @avelanarius ?

mykaul commented 10 months ago

ping @avelanarius , @roydahan

roydahan commented 9 months ago

@Bouncheck what's the status of this PR? Is it done from your side just waiting for review?