pingcap / tidb-tools

tidb-tools are some useful tool collections for TiDB.
Apache License 2.0
286 stars 190 forks source link

Use md5 checksum instead of crc32 #787

Closed michaelmdeng closed 2 weeks ago

michaelmdeng commented 3 months ago

What problem does this PR solve?

Issue Number: close #634 #703

What is changed and how it works?

Change checksum algorithm to use md5 instead of crc32 to minimize chance of collision.

Essentially a copy of #707 that addresses merge conflicts and simplifies md5 checksum. Since md5 produces a 128-bit checksum, previous iteration attempted to track the checksum as two uint64s for lhs/rhs. This change simplifies the checksum query to bit_xor the lhs and the rhs into a single uint64 output.

Check List

Tests

Code changes

Side effects

> explain SELECT COUNT(*) as CNT, BIT_XOR(CAST(CONV(SUBSTRING(MD5(CONCAT_WS(',', `id`, `name`, CONCAT(isnull(`id`), isnull(`name`)))), 1, 16), 16, 10) AS UNSIGNED)) LMD5, BIT_XOR(CAST(CONV(SUBSTRING(MD5(CONCAT_WS(',', `id`, `name`, CONCAT(isnull(`id`), isnull(`name`)))), 17, 16), 16, 10) AS UNSIGNED)) RMD5 FROM mysql_testing_primary.clusters;
+----------------------------+---------------+-----------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                         | estRows       | task      | access object                                      | operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+----------------------------+---------------+-----------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_20               | 1.00          | root      |                                                    | funcs:count(Column#14)->Column#8, funcs:bit_xor(Column#15)->Column#9, funcs:bit_xor(Column#16)->Column#10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| └─IndexReader_21           | 1.00          | root      |                                                    | index:StreamAgg_8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|   └─StreamAgg_8            | 1.00          | cop[tikv] |                                                    | funcs:count(1)->Column#14, funcs:bit_xor(cast(conv(substring(md5(concat_ws(",", cast(mysql_testing_primary.clusters.id, var_string(20)), mysql_testing_primary.clusters.name, concat("0", cast(isnull(mysql_testing_primary.clusters.name), var_string(20))))), 1, 16), 16, 10), bigint(22) UNSIGNED BINARY))->Column#15, funcs:bit_xor(cast(conv(substring(md5(concat_ws(",", cast(mysql_testing_primary.clusters.id, var_string(20)), mysql_testing_primary.clusters.name, concat("0", cast(isnull(mysql_testing_primary.clusters.name), var_string(20))))), 17, 16), 16, 10), bigint(22) UNSIGNED BINARY))->Column#16 |
|     └─IndexFullScan_19     | 3027392465.00 | cop[tikv] | table:clusters, index:index_clusters_on_name(name) | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+----------------------------+---------------+-----------+----------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.08 sec)

Related changes

CLAassistant commented 3 months ago

CLA assistant check
All committers have signed the CLA.

ti-chi-bot[bot] commented 2 weeks ago

[LGTM Timeline notifier]

Timeline:

Leavrth commented 2 weeks ago

/merge

ti-chi-bot[bot] commented 2 weeks ago

@Leavrth: We have migrated to builtin LGTM and approve plugins for reviewing.

👉 Please use /approve when you want approve this pull request.

The changes announcement: LGTM plugin changes

Instructions for interacting with me using PR comments are available [here](https://prow.tidb.net/command-help). If you have questions or suggestions related to my behavior, please file an issue against the [ti-community-infra/tichi](https://github.com/ti-community-infra/tichi/issues/new?title=Prow%20issue:) repository.
Leavrth commented 2 weeks ago

/test all

Leavrth commented 2 weeks ago

/test unit-test

ti-chi-bot[bot] commented 2 weeks ago

@Leavrth: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

Use /test all to run all jobs.

In response to [this](https://github.com/pingcap/tidb-tools/pull/787#issuecomment-2277354794): >/test unit-test Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Leavrth commented 2 weeks ago

/test pull-unit-test

ti-chi-bot[bot] commented 2 weeks ago

@Leavrth: The specified target(s) for /test were not found. The following commands are available to trigger required jobs:

Use /test all to run all jobs.

In response to [this](https://github.com/pingcap/tidb-tools/pull/787#issuecomment-2277355161): >/test pull-unit-test Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
lance6716 commented 2 weeks ago

/test pull-verify

Leavrth commented 2 weeks ago

/LGTM

ti-chi-bot[bot] commented 2 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Leavrth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/pingcap/tidb-tools/blob/master/OWNERS)~~ [Leavrth] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
wuhuizuo commented 2 weeks ago

/ok-to-test