pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
430 stars 286 forks source link

ddl_puller.go(ticdc): fix DDLs are ignored when schema versions are out of order (#11733) #11759

Open ti-chi-bot opened 1 week ago

ti-chi-bot commented 1 week ago

This is an automated cherry-pick of #11733

What problem does this PR solve?

Issue Number: close #11714

What is changed and how it works?

Description: This pull request addresses a bug in TiCDC where a DDL job could be inadvertently dropped if a subsequent DDL job with a higher SchemaVersion but lower CommitTs is processed first. This behavior occurs because the ddlJobPuller processes DDL jobs based on their CommitTs in ascending order and uses a check against SchemaVersion that can prevent older DDL jobs from being processed, leading to data inconsistencies.

Background: The issue manifests when two DDL jobs are processed:

In this scenario, Job 62 is processed first due to its lower CommitTs. The ddlJobPuller then updates its SchemaVersion to that of Job 62. When Job 60 is subsequently processed, the current logic discards it because its SchemaVersion is deemed older, even though its CommitTs is higher.

This issue occurs because the SchemaVersion increment and job metadata write to TiKV are separate transactions. During a TiDB owner change, different instances might write these transactions without synchronization, leading to potential out-of-order CommitTs relative to SchemaVersion.

Solution: To resolve this, the check on SchemaVersion is removed, ensuring that only the CommitTs is verified:

Updated Code:

if job.BinlogInfo.FinishedTS <= p.getResolvedTs() {
    log.Info("ddl job finishedTs less than puller resolvedTs," +
        "discard the ddl job",
        zap.String("namespace", p.changefeedID.Namespace),
        zap.String("changefeed", p.changefeedID.ID),
        zap.String("schema", job.SchemaName),
        zap.String("table", job.TableName),
        zap.Uint64("startTs", job.StartTS),
        zap.Uint64("finishedTs", job.BinlogInfo.FinishedTS),
        zap.String("query", job.Query),
        zap.Uint64("pullerResolvedTs", p.getResolvedTs()))
    return true, nil
}

Reasoning: The ResolvedTs check ensures that only DDL jobs with FinishedTS greater than the current ResolvedTs are processed. Since the ddlJobPuller receives DDLs sorted by CommitTs, any new DDL received with a FinishedTS greater than ResolvedTs must be handled, making the SchemaVersion check redundant.

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

No.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix the issue that DDL jobs could be incorrectly discarded in TiCDC when their schema versions were not in a strictly linear order with commit timestamps due to TiDB owner changes.
ti-chi-bot[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign amyangfei for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/pingcap/tiflow/blob/release-8.1/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Please upload report for BASE (release-8.1@0c49d2a). Learn more about missing BASE report.

Additional details and impacted files | [Components](https://app.codecov.io/gh/pingcap/tiflow/pull/11759/components?src=pr&el=components&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | Coverage Δ | | |---|---|---| | [cdc](https://app.codecov.io/gh/pingcap/tiflow/pull/11759/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `61.7601% <0.0000%> (?)` | | | [dm](https://app.codecov.io/gh/pingcap/tiflow/pull/11759/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `51.0662% <0.0000%> (?)` | | | [engine](https://app.codecov.io/gh/pingcap/tiflow/pull/11759/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `63.4303% <0.0000%> (?)` | | | [Flag](https://app.codecov.io/gh/pingcap/tiflow/pull/11759/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | Coverage Δ | | |---|---|---| | [unit](https://app.codecov.io/gh/pingcap/tiflow/pull/11759/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `57.4018% <100.0000%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap#carryforward-flags-in-the-pull-request-comment) to find out more. ```diff @@ Coverage Diff @@ ## release-8.1 #11759 +/- ## ================================================ Coverage ? 57.4018% ================================================ Files ? 854 Lines ? 126057 Branches ? 0 ================================================ Hits ? 72359 Misses ? 48296 Partials ? 5402 ```
wlwilliamx commented 1 week ago

/cc @sdojjy @asddongmen