pingcap / tidb-tools

tidb-tools are some useful tool collections for TiDB.
Apache License 2.0
286 stars 190 forks source link

Update sync-diff-inspector to perform row-level comparison when table checksums match #784

Closed michaelmdeng closed 3 months ago

michaelmdeng commented 3 months ago

What problem does this PR solve?

Current behavior is to perform row-level data comparison only when table checksums don't match and when user requests DML generation. Given #634, there are possible checksum collisions that cause sync-diff-inspector to return false negatives where it considers two different tables to be equal. In these scenarios, sync-diff-inspector will not compare rows and will mistakenly determine tables to be equal.

A checksum can only identify true negatives, ie. two tables are certainly different, and cannot determine true positives, ie. two tables are certainly equal. Thus a more desirable behavior is to use the checksum to remove the need for more expensive row-level data comparison when a simpler/faster checksum can already tell us the tables are different.

However, in the common case that the user wants to use sync-diff-inspector to generate DML statements when the tables are different, we need to perform row-level comparison anyway.

Issue Number: close #634

What is changed and how it works?

Thus, we should perform row-level data comparison if user requests DML generation or if the checksums match. The only case where we shouldn't compare row data is when the user does not desire DML and if the checksums don't match (confirming tables are different w/out data check).

Check List

Tests

Generate checksum collision

create table test_1.test(id int primary key,b blob);
insert into test_1.test values (1, x'FF00'), (2, x'00FF');
create table test_2.test(id int primary key,b blob);
insert into test_2.test values (1, x'00FF'), (2, x'FF00');

Confirm that current state considers tables equal. Confirm changed state considers tables different

Code changes

Side effects

This change will run row-level data comparison in more cases than previous, which can be more resource-intensive than simply checksumming. However this is in exchange for more correctness in cases where checksum collisions occur.

Related changes

ti-chi-bot[bot] commented 3 months ago

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ti-chi-bot[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign joccau for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/pingcap/tidb-tools/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
CLAassistant commented 3 months ago

CLA assistant check
All committers have signed the CLA.

ti-chi-bot[bot] commented 3 months ago

Welcome @michaelmdeng!

It looks like this is your first PR to pingcap/tidb-tools 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to pingcap/tidb-tools. :smiley:

lyzx2001 commented 3 months ago

/retest

lyzx2001 commented 3 months ago

Although this pr can fix the false positive issue, the row-level comparison will cause huge performance regression, as it abandons the distributed computing in tikv.

We do not prefer to fix an issue with very low probability to happen at the cost of performance regression for most cases.

For now, we may still use this pr https://github.com/pingcap/tidb-tools/pull/707 to solve the problem.

michaelmdeng commented 3 months ago

Although this pr can fix the false positive issue, the row-level comparison will cause huge performance regression, as it abandons the distributed computing in tikv.

We do not prefer to fix an issue with very low probability to happen at the cost of performance regression for most cases.

For now, we may still use this pr #707 to solve the problem.

Ack, I've opened #787 as an alternative approach using the checksum improvements you reference.

michaelmdeng commented 3 months ago

Closing in favor of #787