yugabyte / debezium-connector-yugabytedb

A Debezium CDC connector for the YugabyteDB database
https://docs.yugabyte.com/stable/explore/change-data-capture/using-logical-replication/yugabytedb-connector/
Apache License 2.0
12 stars 8 forks source link

[DBZ] Keep explicit checkpoint lower than or equal to the request OpId #352

Closed vaibhav-yb closed 2 months ago

vaibhav-yb commented 2 months ago

Problem

Consider the following scenario for a tablet:

  1. Suppose our explicit checkpoint is at OpId 1.100.
  2. Now the connector streamed records till OpId 1.150.
  3. By this time, all the records are successfully published to Kafka but we have not received any callback yet.
  4. There is a connector/task restart.
  5. Connector will ask for the checkpoint from cdc_state table upon initialisation which will return 1.100 to us.
  6. In the next GetChanges call, connector streams till OpId 1.110 and receives a callback with OpId 1.150 simultaneously.
  7. We will end up marking the checkpoint in the cdc_state table as 1.150 - note that from_op_id is still at 1.110
  8. If the background thread runs now, it will cleanup the intents till 1.150 and when the next GetChanges request will come from OpId 1.110, CDC service will throw an error saying intents are garbage collected:
Caused by: org.yb.client.CDCErrorException: Server[af1fa2513da74d9d9c9cd987579375d1] INTERNAL_ERROR[code 21]: CDCSDK Trying to fetch already GCed intents for transaction c016fa77-b366-4c47-bd9d-a8e94641db0c

Solution

This PR adds the logic to only use the explicit checkpoint value when it is lower than the from_op_id and in cases where the explicit checkpoint is higher than the from_op_id we will use from_op_id as the explicit checkpoint conservatively. Note that this will also cause duplicate records but our eventual guarantee of at least once delivery is satisfied.