pingcap / tidb-tools

tidb-tools are some useful tool collections for TiDB.
Apache License 2.0
286 stars 191 forks source link

sync-diff-inspector v7.1.1: Long execution time due to parsing data incorrect which COLLATE is utf8_general_ci #745

Open snowballbear opened 11 months ago

snowballbear commented 11 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

MySQL:

CREATE TABLE items (
  id varchar(255) NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO test.items (id) VALUES(CONCAT("m", round(rand()*100000000000)));

TiDB : COLLATE=utf8mb4_bin

create database test_bin;
use test_bin;
CREATE TABLE items (
  id varchar(255) NOT NULL,
  PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

COLLATE=utf8mb4_general_ci

create database test_g_ci;
use test_g_ci;
CREATE TABLE items (
  id varchar(255) NOT NULL,
  PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

export data by using dumpling and import data into tidb cluster.

Sync-diff-inspector:

######################### Datasource config ######################### [data-sources] [data-sources.mysql1] host = "xxx.xx.xx.xx" port = 3306 user = "xxxx" password = "xxxx" route-rules = ["rule1"]

[data-sources.tidb0] host = "xxx" port = 4000 user = "xxxx" password = "xxxx"

########################### Routes ###########################

mapping rules

[routes] [routes.rule1] schema-pattern = "test" # source db target-schema = "test_bin" # target db

######################### Task config ######################### [task] output-dir = "./output1" source-instances = ["mysql1"] target-instance = "tidb0"

target tables to be checked

target-check-tables = ["test_bin.items"]

- sync-config-g-ci.toml
```shell
# Diff Configuration.
######################### Global config #########################
export-fix-sql = true
# check data also
check-struct-only = false

######################### Datasource config #########################
[data-sources]
[data-sources.mysql1]
    host = "xxx.xx.xx.xx"
    port = 3306
    user = "xxxx"
    password = "xxxx"
    route-rules = ["rule1"]

[data-sources.tidb0]
    host = "xxx"
    port = 4000
    user = "xxxx"
    password = "xxxx"

########################### Routes ###########################
# mapping rules
[routes]
[routes.rule1] 
schema-pattern = "test"      # source db
target-schema = "test_g_ci"         # target db

######################### Task config #########################
[task]
    output-dir = "./output2"
    source-instances = ["mysql1"]
    target-instance = "tidb0"
    # target tables to be checked
    target-check-tables = ["test_g_ci.items"]

sync_diff_inspector --config=./sync-config-bin.toml

[ec2-user@ip-172-16-1-46 sync_diff]$ cat ./output1/sync_diff.log
[2023/09/13 01:45:50.339 +00:00] [INFO] [printer.go:46] ["Welcome to sync_diff_inspector"] ["Release Version"=v7.1.0] ["Git Commit Hash"=89b804103994cd63273fa9e57e71f270b1a42fcc] ["Git Branch"=heads/refs/tags/v7.1.0] ["UTC Build Time"="2023-05-24 03:09:18"] ["Go Version"=go1.20.3]
[2023/09/13 01:45:50.350 +00:00] [INFO] [main.go:101] [config="{\"check-thread-count\":4,\"split-thread-count\":5,\"export-fix-sql\":true,\"check-struct-only\":false,\"dm-addr\":\"\",\"dm-task\":\"\",\"data-sources\":{\"mysql1\":{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null},\"tidb0\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null}},\"routes\":{\"rule1\":{\"schema-pattern\":\"test\",\"table-pattern\":\"\",\"target-schema\":\"test_bin\",\"target-table\":\"\"}},\"table-configs\":null,\"task\":{\"source-instances\":[\"mysql1\"],\"source-routes\":null,\"target-instance\":\"tidb0\",\"target-check-tables\":[\"test_bin.items\"],\"target-configs\":null,\"output-dir\":\"./output1\",\"SourceInstances\":[{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null}],\"TargetInstance\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null},\"TargetTableConfigs\":null,\"TargetCheckTables\":[{}],\"FixDir\":\"output1/fix-on-tidb0\",\"CheckpointDir\":\"output1/checkpoint\",\"HashFile\":\"\"},\"ConfigFile\":\"./sync-config-bin.toml\",\"PrintVersion\":false}"]
[2023/09/13 01:45:50.432 +00:00] [INFO] [mysql_shard.go:372] ["will increase connection configurations for DB of instance"] ["connection limit"=10]
[2023/09/13 01:45:50.432 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:45:50.434 +00:00] [INFO] [tidb.go:209] ["find router for tidb source"]
[2023/09/13 01:45:50.445 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:45:50.448 +00:00] [INFO] [diff.go:363] ["The downstream is TiDB. pick it as work source first"]
[2023/09/13 01:45:50.550 +00:00] [INFO] [diff.go:191] ["not found checkpoint file, start from beginning"]
[2023/09/13 01:45:50.556 +00:00] [INFO] [diff.go:721] ["start writeSQLs goroutine"]
[2023/09/13 01:45:50.559 +00:00] [INFO] [diff.go:377] ["start handleCheckpoint goroutine"]
[2023/09/13 01:45:50.593 +00:00] [INFO] [bucket.go:210] ["get chunk size for table"] ["chunk size"=50000] [db=test_bin] [table=items]
[2023/09/13 01:45:50.594 +00:00] [INFO] [bucket.go:103] ["close chunks channel for table"] [schema=test_bin] [table=items]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":129,\"bucket-index-right\":171,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m55408434063\",\"upper\":\"m70381241730\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":172,\"bucket-index-right\":214,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m70381241730\",\"upper\":\"m85510843141\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m85510843141\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":0,\"bucket-index-right\":42,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\",\"upper\":\"m25131780026\",\"has-lower\":false,\"has-upper\":true}]"]
[2023/09/13 01:45:50.594 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":43,\"bucket-index-right\":85,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m25131780026\",\"upper\":\"m40253386870\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.703 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":86,\"bucket-index-right\":128,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"m40253386870\",\"upper\":\"m55408434063\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:732] ["write sql channel closed"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:723] ["close writeSQLs goroutine"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:405] ["Stop do checkpoint"]
[2023/09/13 01:45:50.767 +00:00] [INFO] [checkpoints.go:225] ["save checkpoint"] [chunk="{\"state\":\"success\",\"chunk-range\":{\"index\":{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1},\"type\":1,\"bounds\":[{\"column\":\"id\",\"lower\":\"m85510843141\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}],\"is-first\":false,\"is-last\":false,\"where\":\"(((`id` \\u003e ?)) AND (TRUE))\",\"args\":[\"m85510843141\"]},\"index-id\":1}"] [state=success]
[2023/09/13 01:45:50.767 +00:00] [INFO] [diff.go:379] ["close handleCheckpoint goroutine"]
[2023/09/13 01:45:50.780 +00:00] [INFO] [main.go:114] ["check data finished"] [cost=428.311227ms]
[2023/09/13 01:45:50.781 +00:00] [INFO] [main.go:108] ["check pass!!!"]

sync_diff_inspector --config=./sync-config-g-ci.toml

[ec2-user@ip-172-16-1-46 ~]$ cat ./sync_diff/output2/sync_diff.log 
[2023/09/13 01:46:39.162 +00:00] [INFO] [printer.go:46] ["Welcome to sync_diff_inspector"] ["Release Version"=v7.1.0] ["Git Commit Hash"=89b804103994cd63273fa9e57e71f270b1a42fcc] ["Git Branch"=heads/refs/tags/v7.1.0] ["UTC Build Time"="2023-05-24 03:09:18"] ["Go Version"=go1.20.3]
[2023/09/13 01:46:39.162 +00:00] [INFO] [main.go:101] [config="{\"check-thread-count\":4,\"split-thread-count\":5,\"export-fix-sql\":true,\"check-struct-only\":false,\"dm-addr\":\"\",\"dm-task\":\"\",\"data-sources\":{\"mysql1\":{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null},\"tidb0\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null}},\"routes\":{\"rule1\":{\"schema-pattern\":\"test\",\"table-pattern\":\"\",\"target-schema\":\"test_g_ci\",\"target-table\":\"\"}},\"table-configs\":null,\"task\":{\"source-instances\":[\"mysql1\"],\"source-routes\":null,\"target-instance\":\"tidb0\",\"target-check-tables\":[\"test_g_ci.items\"],\"target-configs\":null,\"output-dir\":\"./output2\",\"SourceInstances\":[{\"host\":\"172.16.1.46\",\"port\":3306,\"user\":\"admin\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":[\"rule1\"],\"Router\":{\"Selector\":{}},\"Conn\":null}],\"TargetInstance\":{\"host\":\"private-tidb.plouwpu5awb.clusters.tidb-cloud.com\",\"port\":4000,\"user\":\"root\",\"password\":\"******\",\"sql-mode\":\"\",\"snapshot\":\"\",\"security\":null,\"route-rules\":null,\"Router\":{\"Selector\":{}},\"Conn\":null},\"TargetTableConfigs\":null,\"TargetCheckTables\":[{}],\"FixDir\":\"output2/fix-on-tidb0\",\"CheckpointDir\":\"output2/checkpoint\",\"HashFile\":\"\"},\"ConfigFile\":\"./sync-config-g-ci.toml\",\"PrintVersion\":false}"]
[2023/09/13 01:46:39.193 +00:00] [INFO] [mysql_shard.go:372] ["will increase connection configurations for DB of instance"] ["connection limit"=10]
[2023/09/13 01:46:39.193 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:46:39.194 +00:00] [INFO] [tidb.go:209] ["find router for tidb source"]
[2023/09/13 01:46:39.203 +00:00] [INFO] [source.go:412] ["table match check finished"]
[2023/09/13 01:46:39.206 +00:00] [INFO] [diff.go:363] ["The downstream is TiDB. pick it as work source first"]
[2023/09/13 01:46:39.283 +00:00] [INFO] [diff.go:191] ["not found checkpoint file, start from beginning"]
[2023/09/13 01:46:39.286 +00:00] [WARN] [utils.go:335] ["Ignoring collation differences"] ["column name"=id] ["collation source"=utf8mb4_bin] ["collation target"=utf8mb4_general_ci]
[2023/09/13 01:46:39.286 +00:00] [INFO] [diff.go:721] ["start writeSQLs goroutine"]
[2023/09/13 01:46:39.286 +00:00] [INFO] [diff.go:377] ["start handleCheckpoint goroutine"]
[2023/09/13 01:46:39.295 +00:00] [INFO] [bucket.go:210] ["get chunk size for table"] ["chunk size"=50000] [db=test_g_ci] [table=items]
[2023/09/13 01:46:39.295 +00:00] [INFO] [bucket.go:103] ["close chunks channel for table"] [schema=test_g_ci] [table=items]
[2023/09/13 01:46:39.295 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":129,\"bucket-index-right\":171,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00005\\u00005\\u00003\\u00007\\u00005\\u00009\\u00006\\u00007\\u00005\\u00009\\u00009\",\"upper\":\"\\u0000M\\u00007\\u00000\\u00004\\u00002\\u00005\\u00002\\u00005\\u00004\\u00005\\u00004\\u00005\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":172,\"bucket-index-right\":214,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00007\\u00000\\u00004\\u00002\\u00005\\u00002\\u00005\\u00004\\u00005\\u00004\\u00005\",\"upper\":\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":0,\"bucket-index-right\":42,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\",\"upper\":\"\\u0000M\\u00002\\u00005\\u00001\\u00007\\u00005\\u00001\\u00004\\u00006\\u00002\\u00002\\u00003\",\"has-lower\":false,\"has-upper\":true}]"]
[2023/09/13 01:46:39.296 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":43,\"bucket-index-right\":85,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00002\\u00005\\u00001\\u00007\\u00005\\u00001\\u00004\\u00006\\u00002\\u00002\\u00003\",\"upper\":\"\\u0000M\\u00004\\u00000\\u00003\\u00006\\u00004\\u00009\\u00007\\u00001\\u00000\\u00004\\u00009\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.312 +00:00] [INFO] [diff.go:280] ["global consume chunk info"] ["chunk index"="{\"table-index\":0,\"bucket-index-left\":86,\"bucket-index-right\":128,\"chunk-index\":0,\"chunk-count\":1}"] ["chunk bound"="[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00004\\u00000\\u00003\\u00006\\u00004\\u00009\\u00007\\u00001\\u00000\\u00004\\u00009\",\"upper\":\"\\u0000M\\u00005\\u00005\\u00003\\u00007\\u00005\\u00009\\u00006\\u00007\\u00005\\u00009\\u00009\",\"has-lower\":true,\"has-upper\":true}]"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [diff.go:732] ["write sql channel closed"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [diff.go:723] ["close writeSQLs goroutine"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [diff.go:405] ["Stop do checkpoint"]
[2023/09/13 01:46:39.586 +00:00] [INFO] [checkpoints.go:225] ["save checkpoint"] [chunk="{\"state\":\"success\",\"chunk-range\":{\"index\":{\"table-index\":0,\"bucket-index-left\":215,\"bucket-index-right\":256,\"chunk-index\":0,\"chunk-count\":1},\"type\":1,\"bounds\":[{\"column\":\"id\",\"lower\":\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\",\"upper\":\"\",\"has-lower\":true,\"has-upper\":false}],\"is-first\":false,\"is-last\":false,\"where\":\"(((`id` \\u003e ?)) AND (TRUE))\",\"args\":[\"\\u0000M\\u00008\\u00005\\u00005\\u00009\\u00003\\u00000\\u00002\\u00007\\u00001\\u00004\\u00002\"]},\"index-id\":1}"] [state=success]
[2023/09/13 01:46:39.587 +00:00] [INFO] [diff.go:379] ["close handleCheckpoint goroutine"]
[2023/09/13 01:46:39.596 +00:00] [INFO] [main.go:114] ["check data finished"] [cost=433.003467ms]
[2023/09/13 01:46:39.596 +00:00] [INFO] [main.go:108] ["check pass!!!"]
mysql> show stats_buckets where Table_name='items';
+-----------+------------+----------------+-------------+----------+-----------+--------+---------+--------------------------+--------------------------+------+
| Db_name   | Table_name | Partition_name | Column_name | Is_index | Bucket_id | Count  | Repeats | Lower_Bound              | Upper_Bound              | Ndv  |
+-----------+------------+----------------+-------------+----------+-----------+--------+---------+--------------------------+--------------------------+------+
| test_bin  | items      |                | id          |        0 |         0 |   1174 |       1 | m10000063581             | m10364587056             |    0 |
| test_bin  | items      |                | id          |        0 |         1 |   2349 |       1 | m10364941331             | m10736559528             |    0 |
| test_bin  | items      |                | id          |        0 |         2 |   3524 |       1 | m10736698574             | m11081677034             |    0 |
| test_bin  | items      |                | id          |        0 |         3 |   4699 |       1 | m11081683367             | m11436310887             |    0 |
| test_bin  | items      |                | id          |        0 |         4 |   5874 |       1 | m11436381015             | m11770389706             |    0 |
....
....
| test_g_ci | items      |                | PRIMARY     |        1 |       251 | 295994 |       1 |  M 9 8 2 5 1 0 9 2 0 6 0 |  M 9 8 6 2 4 6 5 5 1 4   |    0 |
| test_g_ci | items      |                | PRIMARY     |        1 |       252 | 297169 |       1 |  M 9 8 6 2 6 9 5 8 4 8 4 |  M 9 8 9 7 9 9 5 9 5 4 3 |    0 |
| test_g_ci | items      |                | PRIMARY     |        1 |       253 | 298344 |       1 |  M 9 8 9 8 0 5 6 7 4 1 7 |  M 9 9 3 2 4 5 7 1 2 4 8 |    0 |
| test_g_ci | items      |                | PRIMARY     |        1 |       254 | 299518 |       1 |  M 9 9 3 2 5 5 8 8 4 3 8 |  M 9 9 6 7 7 5 0 3 3 8 8 |    0 |
| test_g_ci | items      |                | PRIMARY     |        1 |       255 | 300655 |       1 |  M 9 9 6 7 8 6 7 7 5 0 6 |  M 9 9 9 9 9 3 4 3 3 2 4 |    0 |
+-----------+------------+----------------+-------------+----------+-----------+--------+---------+--------------------------+--------------------------+------+
1024 rows in set (0.02 sec)
  1. What did you expect to see? Comparison can be completed.

  2. What did you see instead?

The amount of data compared is 3414406630 that it hasn't been finished for more than 5.5 hours.

  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)? v7.1.1

  2. which tool are you using? sync-diff-inspector v7.1.1

  3. what versionof tool are you using (pump -V or tidb-lightning -V or syncer -V)? sync-diff-inspector v7.1.1

snowballbear commented 11 months ago

/found gs