HashPartitions are created and the partition keys are passed down to the tasks
Tasks receive an immutable list of tablets and start processing that
In snapshot phase, connector directly rely on the passed list from the top level connector and take snapshot but in streaming phase it validates the tablet list again by asking the current tablets from service.
Now suppose if a tablet is split and connector is in streaming phase, connector will handle the split gracefully.
At this stage, if a task is restarted the flow will start from the snapshot phase itself and connector will try to get the checkpoint of tablets in the task to confirm whether to take snapshot. But the tablet list available here is stale and it will error out while getting the checkpoint since the tablet would have been deleted by this time.
Solution
This diff implements a solution to the problem by making a call to GetTabletListToPollForCDC in the snapshot phase as well which then ensures that we are only receiving a valid set of tablets to poll.
Testing
Tested manually using the following steps:
Start connector with snapshot mode initial with a single tablet
Split a tablet once it reaches streaming phase
Insert more data
Restart the task
Upon restart:
i. Without fix: The task will fail while trying to get the checkpoint of deleted parent tablet
ii. With fix: Connector proceeds to the streaming phase without any error
Problem
The connector flow today is such that:
HashPartitions
are created and the partition keys are passed down to the tasksSolution
This diff implements a solution to the problem by making a call to
GetTabletListToPollForCDC
in the snapshot phase as well which then ensures that we are only receiving a valid set of tablets to poll.Testing
Tested manually using the following steps:
This fixes yugabyte/yugabyte-db#20636