pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
428 stars 286 forks source link

Tracking issues for TiCDC Data Path Correctness Verification #4519

Open ben1009 opened 2 years ago

ben1009 commented 2 years ago

Is your feature request related to a problem?

From a product perspective, currently TiCDC don't have the ability to verify the data correctness without stopping the replication process. From an engineer 's perspective, even after failing the validation check in the test process, it is still hard to debug the root cause because of a lack of information.

Describe the feature you'd like

Provides end to end verification without stopping the replication process, meanwhile providing module level check inside TiCDC, hope could provide more information and narrow down the root cause to module level if data correctness check fails.

Task breakdown

total 16 weeks, development & self-test 12 week, scenario test 4 week.

Detail subtasks as follows

  1. e2e verification
    • [x] #4550
    • [x] MYSQL as downstream. 2 weeks
    • [ ] Kafka as downstream. 4 weeks
  2. module level verification. 3 weeks
    • [x] #4711
      • [ ] puller
      • [ ] sorter
      • [ ] cyclic
      • [ ] sink
      • [ ] gc for tracking data
  3. configurations. 1.5 weeks
    • [x] #4853
  4. tests. 4 weeks
    • [ ] scenario test or test-infra if possible
ben1009 commented 2 years ago

https://github.com/pingcap/tiflow/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc+label%3Acomponent%2Fverification