pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
429 stars 287 forks source link

DM data inconsistency after dm worker down #7658

Closed fubinzh closed 1 year ago

fubinzh commented 1 year ago

What did you do?

  1. Deploy DM cluster with 3 dm worker
  2. Create dm tasks
  3. Run sysbench workload for 5 min
  4. Kill the dm-worker
  5. Wait dm task synced to downstream
  6. Compare data consistency between upstream/downstream

What did you expect to see?

Data should be consistent

What did you see instead?

Data inconsistency seen

Comparing the table structure of ``dm_tolerance_worker_down`.`sbtest2`` ... equivalent
Comparing the table structure of ``dm_tolerance_worker_down`.`sbtest3`` ... equivalent
Comparing the table structure of ``dm_tolerance_worker_down`.`sbtest4`` ... equivalent
Comparing the table structure of ``dm_tolerance_worker_down`.`sbtest1`` ... equivalent
Comparing the table data of ``dm_tolerance_worker_down`.`sbtest2`` ... failure
Comparing the table data of ``dm_tolerance_worker_down`.`sbtest3`` ... failure
Comparing the table data of ``dm_tolerance_worker_down`.`sbtest1`` ... failure
Comparing the table data of ``dm_tolerance_worker_down`.`sbtest4`` ... failure
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
The data of `dm_tolerance_worker_down`.`sbtest4` is not equal
The data of `dm_tolerance_worker_down`.`sbtest3` is not equal
The data of `dm_tolerance_worker_down`.`sbtest2` is not equal
The data of `dm_tolerance_worker_down`.`sbtest1` is not equal

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

/ # /dm-master -V                                                      
Release Version: v6.5.0-alpha                                          
Git Commit Hash: 9d60d064ef5e681867a60c95cdb2b99c10c0c67c              
Git Branch: heads/refs/tags/v6.5.0-alpha                               
UTC Build Time: 2022-11-18 11:06:46                                    
Go Version: go version go1.19.2 linux/amd64                            
Failpoint Build: false                                                 

/ # /dm-worker -V                                                      
Release Version: v6.5.0-alpha                                          
Git Commit Hash: 9d60d064ef5e681867a60c95cdb2b99c10c0c67c              
Git Branch: heads/refs/tags/v6.5.0-alpha                               
UTC Build Time: 2022-11-18 11:06:46                                    
Go Version: go version go1.19.2 linux/amd64                            
Failpoint Build: false                                                 
/ # ./dmctl -V                                             
Release Version: v6.5.0-alpha                              
Git Commit Hash: 9d60d064ef5e681867a60c95cdb2b99c10c0c67c  
Git Branch: heads/refs/tags/v6.5.0-alpha                   
UTC Build Time: 2022-11-18 11:06:46                        
Go Version: go version go1.19.2 linux/amd64                
Failpoint Build: false                                     

Upstream MySQL/MariaDB server version:

mysql> select version();
+------------+
| version()  |
+------------+
| 5.7.36-log |
+------------+
1 row in set (0.00 sec)

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

mysql> select tidb_version();
| tidb_version()

| Release Version: v6.5.0-alpha
Edition: Community
Git Commit Hash: 3bcd5a889bded9ab23e7d077ea5c69df7982a495
Git Branch: heads/refs/tags/v6.5.0-alpha
UTC Build Time: 2022-11-19 11:13:51
GoVersion: go1.19.2
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: tikv |

How did you deploy DM: tiup or manually?

ti-operator

Other interesting information (system version, hardware config, etc):

>
>

current status of DM cluster (execute query-status <task-name> in dmctl)

/ # ./dmctl --master-addr=127.0.0.1:8261 query-status -s source-0                            
{                                                                                            
    "result": true,                                                                          
    "msg": "",                                                                               
    "sources": [                                                                             
        {                                                                                    
            "result": true,                                                                  
            "msg": "",                                                                       
            "sourceStatus": {                                                                
                "source": "source-0",                                                        
                "worker": "dm-dm-worker-1",                                                  
                "result": null,                                                              
                "relayStatus": null                                                          
            },                                                                               
            "subTaskStatus": [                                                               
                {                                                                            
                    "name": "test",                                                          
                    "stage": "Running",                                                      
                    "unit": "Sync",                                                          
                    "result": null,                                                          
                    "unresolvedDDLLockID": "",                                               
                    "sync": {                                                                
                        "totalEvents": "3080479",                                            
                        "totalTps": "221",                                                   
                        "recentTps": "0",                                                    
                        "masterBinlog": "(mysql-bin.000002, 407120738)",                     
                        "masterBinlogGtid": "adade2e6-693e-11ed-a584-e65bc66ff355:1-835280", 
                        "syncerBinlog": "(mysql-bin.000002, 407120738)",                     
                        "syncerBinlogGtid": "adade2e6-693e-11ed-a584-e65bc66ff355:1-835280", 
                        "blockingDDLs": [                                                    
                        ],                                                                   
                        "unresolvedGroups": [                                                
                        ],                                                                   
                        "synced": true,                                                      
                        "binlogType": "remote",                                              
                        "secondsBehindMaster": "0",                                          
                        "blockDDLOwner": "",                                                 
                        "conflictMsg": "",                                                   
                        "totalRows": "3080479",                                              
                        "totalRps": "221",                                                   
                        "recentRps": "0"                                                     
                    },                                                                       
                    "validation": null                                                       
                }                                                                            
            ]                                                                                
        }                                                                                    
    ]                                                                                        
}                                                                                            
fubinzh commented 1 year ago

dm-master, dm-worker logs dm-master.0.log dm-master.1.log dm-master.2.log dm-worker.0.log dm-worker.1.log dm-worker.2.log

fubinzh commented 1 year ago

sync diff log

sync_diff.log

dm_tolerance_worker_down_sbtest1_3_0-250_0.sql.txt dm_tolerance_worker_down_sbtest2_2_0-250_0.sql.txt dm_tolerance_worker_down_sbtest3_1_0-249_0.sql.txt dm_tolerance_worker_down_sbtest4_0_0-250_0.sql.txt

GMHDBJD commented 1 year ago

worker1

[2022/11/21 03:07:13.446 +00:00] [INFO] [main.go:77] ["got signal to exit"] [signal=terminated]
[2022/11/21 03:07:13.446 +00:00] [INFO] [server.go:606] ["handleSourceBound will quit now"]
[2022/11/21 03:07:13.446 +00:00] [INFO] [server.go:645] ["worker server is closed, handleRelayConfig will quit now"]
[2022/11/21 03:07:13.446 +00:00] [INFO] [keepalive.go:144] ["ctx is canceled, keepalive will exit now"]
[2022/11/21 03:07:13.446 +00:00] [ERROR] [metrics.go:141] ["status server returned"] [error="mux: server closed"]
[2022/11/21 03:07:13.446 +00:00] [INFO] [server.go:365] ["observeRelayConfig will quit now"]
[2022/11/21 03:07:13.446 +00:00] [INFO] [server.go:458] ["observeSourceBound will quit now"]
[2022/11/21 03:07:13.447 +00:00] [INFO] [source_worker.go:794] ["worker is closed, handleSubTaskStage will quit now"]
[2022/11/21 03:07:13.447 +00:00] [INFO] [source_worker.go:755] ["observeSubtaskStage will quit now"]
[2022/11/21 03:07:13.447 +00:00] [INFO] [source_worker.go:1214] ["worker is closed, handle validator stage will quit now"]
[2022/11/21 03:07:13.447 +00:00] [INFO] [source_worker.go:1174] ["observe validator stage will quit now"]
[2022/11/21 03:07:13.447 +00:00] [INFO] [source_worker.go:173] ["status print process exits!"] [component="worker controller"]
[2022/11/21 03:07:13.447 +00:00] [INFO] [subtask.go:558] [closing] [subtask=test]
[2022/11/21 03:07:13.447 +00:00] [INFO] [syncer.go:1550] ["received subtask's done, try graceful stop"] [task=test] [unit="binlog replication"]
[2022/11/21 03:07:13.449 +00:00] [WARN] [join.go:104] ["keepalive with master goroutine paused"] []
[2022/11/21 03:07:13.479 +00:00] [INFO] [syncer.go:3042] ["flush all jobs"] [task=test] [unit="binlog replication"] ["global checkpoint"="location(position: (mysql-bin.000001, 228781543), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-128688) (flushed location(position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222))"] ["flush job seq"=6]
[2022/11/21 03:07:13.841 +00:00] [WARN] [db.go:258] ["execute statements"] [task=test] [unit="binlog replication"] [retry=0] [queries="[UPDATE `dm_tolerance_worker_down`.`sbtest4` SET `id` = ?, `k` = ?, `c` = ?, `pad` = ? WHERE `id` = ? LIMIT 1 UPDATE `dm_tolerance_worker_down`.`sbtest4` SET `id` = ?, `k` = ?, `c` = ?, `pad` = ? WHERE `id` = ? LIMIT 1 UPDATE `dm_tolerance_worker_down`.`sbtest4` SET `id` = ?, `k` = ?, `c` = ?, `pad` = ? WHERE `id` = ? LIMIT 1 UPDATE `dm_tolerance_worker_down`.`sbtest1` SET `id` = ?, `k` = ?, `c` = ?, `pad` = ? WHERE `id` = ? LIMIT 1 UPDATE `dm_tolerance_worker_down`.`sbtest4` SET `id` = ?, `k` = ?, `c` = ?, `pad` = ? WHERE `id` = ? LIMIT 1 DELETE FROM `dm_tolerance_worker_down`.`sbtest1` WHERE `id` = ? LIMIT 1 INSERT INTO `dm_tolerance_worker_down`.`sbtest1` (`id`,`k`,`c`,`pad`) VALUES (?,?,?,?) UPDATE `dm_tolerance_worker_down`.`sbtest4` SET `id` = ?, `k` = ?, `c` = ?, `pad` = ? WHERE `id` = ? LIMIT 1 UPDATE `dm_tolerance_worker_down`.`sbtest4` SET `id` = ?, `k` = ?, `c` = ?, `pad` = ? WHERE `id` = ? LIMIT 1 DELETE FROM `dm_tolerance_worker_down`.`sbtest3` WHERE `id` = ? LIMIT 1 INSERT INTO `dm_tolerance_wor..."] [arguments="[[503 504 75379929375-08901950793-78555803083-37489850335-96619603456-76071177473-68316350643-89944734303-67369663144-82806199389 27104307562-59013074579-84629885118-77599150594-97937567858 503] [500 502 25551873189-28557929966-55096509370-31053233672-10243800319-57862604282-24382628700-51343185350-46623125836-09364029393 89519814057-39012474348-50440312914-41053159029-58587947733 500] [506 501 82759612895-44213726707-52928207651-20930860052-72247065838-12160489227-69954616781-31924212256-49732188019-77673881338 98562585992-80034001228-50806933261-41462389081-61465632590 506] [501 621 10066221017-01991777332-95919821134-81175801143-30702162977-52321232090-63548496353-46266718451-29623283269-88472243687 55050446129-52842948908-94659911173-21473402916-07226907320 501] [505 503 15032163607-34595688917-78305583815-89429429596-64533676258-82267262783-36083015352-72012067832-25451963739-71916254440 27968209570-18177379540-90734677961-80680051503-07795555093 505] [504] [504 499 35373701899-26450908858-91780615263-19..."] [error="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: Error 9007: Write conflict, txnStartTS=437518344851030067, conflictStartTS=437518344864137244, conflictCommitTS=437518344864137255, key={tableID=89, tableName=dm_tolerance_worker_down.sbtest4, handle=500}, originalKey=7480000000000000595f7280000000000001f4, primary={tableID=87, tableName=dm_tolerance_worker_down.sbtest1, handle=501}, originalPrimaryKey=7480000000000000575f7280000000000001f5, reason=Optimistic [try again later]"]
[2022/11/21 03:07:13.841 +00:00] [WARN] [strategy.go:94] ["retry stratey takes effect"] [error="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: Error 9007: Write conflict, txnStartTS=437518344851030067, conflictStartTS=437518344864137244, conflictCommitTS=437518344864137255, key={tableID=89, tableName=dm_tolerance_worker_down.sbtest4, handle=500}, originalKey=7480000000000000595f7280000000000001f4, primary={tableID=87, tableName=dm_tolerance_worker_down.sbtest1, handle=501}, originalPrimaryKey=7480000000000000575f7280000000000001f5, reason=Optimistic [try again later]"] [errorVerbose="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: Error 9007: Write conflict, txnStartTS=437518344851030067, conflictStartTS=437518344864137244, conflictCommitTS=437518344864137255, key={tableID=89, tableName=dm_tolerance_worker_down.sbtest4, handle=500}, originalKey=7480000000000000595f7280000000000001f4, primary={tableID=87, tableName=dm_tolerance_worker_down.sbtest1, handle=501}, originalPrimaryKey=7480000000000000575f7280000000000001f5, reason=Optimistic [try again later]\ngithub.com/pingcap/tiflow/dm/pkg/terror.(*Error).Delegate\n\tgithub.com/pingcap/tiflow/dm/pkg/terror/terror.go:324\ngithub.com/pingcap/tiflow/dm/pkg/conn.(*BaseConn).ExecuteSQLWithIgnoreError\n\tgithub.com/pingcap/tiflow/dm/pkg/conn/baseconn.go:211\ngithub.com/pingcap/tiflow/dm/syncer/dbconn.(*DBConn).ExecuteSQLWithIgnore.func2\n\tgithub.com/pingcap/tiflow/dm/syncer/dbconn/db.go:181\ngithub.com/pingcap/tiflow/dm/pkg/retry.(*FiniteRetryStrategy).Apply\n\tgithub.com/pingcap/tiflow/dm/pkg/retry/strategy.go:84\ngithub.com/pingcap/tiflow/dm/pkg/conn.(*BaseConn).ApplyRetryStrategy\n\tgithub.com/pingcap/tiflow/dm/pkg/conn/baseconn.go:259\ngithub.com/pingcap/tiflow/dm/syncer/dbconn.(*DBConn).ExecuteSQLWithIgnore\n\tgithub.com/pingcap/tiflow/dm/syncer/dbconn/db.go:172\ngithub.com/pingcap/tiflow/dm/syncer/dbconn.(*DBConn).ExecuteSQL\n\tgithub.com/pingcap/tiflow/dm/syncer/dbconn/db.go:220\ngithub.com/pingcap/tiflow/dm/syncer.(*DMLWorker).executeBatchJobs\n\tgithub.com/pingcap/tiflow/dm/syncer/dml_worker.go:259\ngithub.com/pingcap/tiflow/dm/syncer.(*DMLWorker).executeJobs\n\tgithub.com/pingcap/tiflow/dm/syncer/dml_worker.go:183\nruntime.goexit\n\truntime/asm_amd64.s:1594"] [retry_times=0] [retry_count=100]

master

[2022/11/21 03:07:13.448 +00:00] [INFO] [keepalive.go:213] ["receive dm-worker keep alive event"] [operation=DELETE] [kv=/dm-worker/a/646d2d646d2d776f726b65722d30]
[2022/11/21 03:07:13.448 +00:00] [INFO] [scheduler.go:2041] ["receive worker status change event"] [component=scheduler] [delete=true] [event="{\"worker-name\":\"dm-dm-worker-0\",\"join-time\":\"0001-01-01T00:00:00Z\"}"]
[2022/11/21 03:07:13.449 +00:00] [INFO] [scheduler.go:2211] ["unbound the worker for source"] [component=scheduler] [bound="{\"source\":\"source-0\",\"worker\":\"dm-dm-worker-0\"}"] [event="{\"worker-name\":\"dm-dm-worker-0\",\"join-time\":\"0001-01-01T00:00:00Z\"}"]
[2022/11/21 03:07:13.449 +00:00] [INFO] [scheduler.go:2383] ["found free worker when source bound"] [component=scheduler] [worker=dm-dm-worker-1] [source=source-0]
[2022/11/21 03:07:13.450 +00:00] [INFO] [scheduler.go:2427] ["bound the source to worker"] [component=scheduler] [bound="{\"source\":\"source-0\",\"worker\":\"dm-dm-worker-1\"}"]

worker2

[2022/11/21 03:07:13.467 +00:00] [INFO] [server.go:585] ["receive source bound"] [bound="{\"source\":\"source-0\",\"worker\":\"dm-dm-worker-1\"}"] ["is deleted"=false]
[2022/11/21 03:07:13.468 +00:00] [INFO] [server.go:865] ["will start a new worker"] [sourceID=source-0]
[2022/11/21 03:07:13.468 +00:00] [INFO] [source_worker.go:134] [initialized] [component="worker controller"] [cfg="{\"enable\":true,\"enable-gtid\":true,\"auto-fix-gtid\":false,\"relay-dir\":\"relay-dir\",\"meta-dir\":\"\",\"flavor\":\"mysql\",\"charset\":\"\",\"enable-relay\":false,\"relay-binlog-name\":\"\",\"relay-binlog-gtid\":\"\",\"source-id\":\"source-0\",\"from\":{\"host\":\"upstream1\",\"port\":3306,\"user\":\"root\",\"max-allowed-packet\":null,\"session\":null,\"security\":null},\"purge\":{\"interval\":3600,\"expires\":0,\"remain-space\":15},\"checker\":{\"check-enable\":true,\"backoff-rollback\":{\"Duration\":\"5m0s\"},\"backoff-max\":{\"Duration\":\"5m0s\"}},\"server-id\":429502637,\"case-sensitive\":false,\"filters\":null}"]
[2022/11/21 03:07:13.471 +00:00] [INFO] [source_worker.go:155] ["start running"] [component="worker controller"]
[2022/11/21 03:07:13.569 +00:00] [INFO] [source_worker.go:441] ["enter EnableHandleSubtasks"] [component="worker controller"]
[2022/11/21 03:07:13.574 +00:00] [INFO] [source_worker.go:457] ["starting to handle mysql source"] [component="worker controller"] [sourceCfg="{\"enable\":true,\"enable-gtid\":true,\"auto-fix-gtid\":false,\"relay-dir\":\"relay-dir\",\"meta-dir\":\"\",\"flavor\":\"mysql\",\"charset\":\"\",\"enable-relay\":false,\"relay-binlog-name\":\"\",\"relay-binlog-gtid\":\"\",\"source-id\":\"source-0\",\"from\":{\"host\":\"upstream1\",\"port\":3306,\"user\":\"root\",\"max-allowed-packet\":null,\"session\":null,\"security\":null},\"purge\":{\"interval\":3600,\"expires\":0,\"remain-space\":15},\"checker\":{\"check-enable\":true,\"backoff-rollback\":{\"Duration\":\"5m0s\"},\"backoff-max\":{\"Duration\":\"5m0s\"}},\"server-id\":429502637,\"case-sensitive\":false,\"filters\":null}"] [subTasks="{\"test\":{\"is-sharding\":false,\"shard-mode\":\"\",\"online-ddl\":true,\"shadow-table-rules\":[\"^_(.+)_(?:new|gho)$\"],\"trash-table-rules\":[\"^_(.+)_(?:ghc|del|old)$\"],\"online-ddl-scheme\":\"\",\"case-sensitive\":false,\"collation_compatible\":\"loose\",\"name\":\"test\",\"mode\":\"all\",\"ignore-checking-items\":null,\"source-id\":\"source-0\",\"server-id\":429502637,\"flavor\":\"mysql\",\"meta-schema\":\"dm_meta\",\"heartbeat-update-interval\":1,\"heartbeat-report-interval\":10,\"enable-heartbeat\":false,\"timezone\":\"\",\"meta\":null,\"relay-dir\":\"relay-dir\",\"use-relay\":false,\"from\":{\"host\":\"upstream1\",\"port\":3306,\"user\":\"root\",\"max-allowed-packet\":null,\"session\":null,\"security\":null},\"to\":{\"host\":\"downstream-tidb\",\"port\":4000,\"user\":\"root\",\"max-allowed-packet\":null,\"session\":{\"tidb_txn_mode\":\"optimistic\"},\"security\":null},\"route-rules\":[],\"filter-rules\":[],\"mapping-rule\":[],\"expression-filter\":[],\"black-white-list\":null,\"block-allow-list\":{\"do-tables\":null,\"do-dbs\":[\"dm_tolerance_worker_down\"],\"ignore-tables\":null,\"ignore-dbs\":null},\"mydumper-path\":\"./bin/mydumper\",\"threads\":4,\"chunk-filesize\":\"64\",\"statement-size\":0,\"rows\":0,\"where\":\"\",\"skip-tz-utc\":true,\"extra-args\":\"\",\"pool-size\":16,\"dir\":\"./dumped_data.test\",\"import-mode\":\"logical\",\"on-duplicate\":\"replace\",\"meta-file\":\"\",\"worker-count\":16,\"batch\":100,\"queue-size\":1024,\"checkpoint-flush-interval\":30,\"compact\":false,\"multiple-rows\":false,\"max-retry\":0,\"auto-fix-gtid\":false,\"enable-gtid\":true,\"disable-detect\":false,\"safe-mode\":false,\"safe-mode-duration\":\"60s\",\"enable-ansi-quotes\":false,\"ValidatorCfg\":{\"mode\":\"none\",\"worker-count\":4,\"validate-interval\":\"10s\",\"check-interval\":\"5s\",\"row-error-delay\":\"30m0s\",\"meta-flush-interval\":\"5m0s\",\"batch-query-size\":100,\"max-pending-row-size\":\"500m\",\"max-pending-row-count\":2147483647},\"log-level\":\"\",\"log-file\":\"\",\"log-format\":\"\",\"log-rotate\":\"\",\"pprof-addr\":\"\",\"status-addr\":\"\",\"config-file\":\"\",\"clean-dump-file\":true,\"ansi-quotes\":false,\"experimental\":{\"async-checkpoint-flush\":false}}}"]
[2022/11/21 03:07:13.574 +00:00] [INFO] [source_worker.go:468] ["start to create subtask in EnableHandleSubtasks"] [component="worker controller"] [sourceID=source-0] [task=test]
[2022/11/21 03:07:13.575 +00:00] [INFO] [source_worker.go:571] ["subtask created"] [component="worker controller"] [config="{\"is-sharding\":false,\"shard-mode\":\"\",\"online-ddl\":true,\"shadow-table-rules\":[\"^_(.+)_(?:new|gho)$\"],\"trash-table-rules\":[\"^_(.+)_(?:ghc|del|old)$\"],\"online-ddl-scheme\":\"\",\"case-sensitive\":false,\"collation_compatible\":\"loose\",\"name\":\"test\",\"mode\":\"all\",\"ignore-checking-items\":null,\"source-id\":\"source-0\",\"server-id\":429502637,\"flavor\":\"mysql\",\"meta-schema\":\"dm_meta\",\"heartbeat-update-interval\":1,\"heartbeat-report-interval\":10,\"enable-heartbeat\":false,\"timezone\":\"\",\"meta\":null,\"relay-dir\":\"relay-dir\",\"use-relay\":false,\"from\":{\"host\":\"upstream1\",\"port\":3306,\"user\":\"root\",\"max-allowed-packet\":null,\"session\":null,\"security\":null},\"to\":{\"host\":\"downstream-tidb\",\"port\":4000,\"user\":\"root\",\"max-allowed-packet\":null,\"session\":{\"tidb_txn_mode\":\"optimistic\"},\"security\":null},\"route-rules\":[],\"filter-rules\":[],\"mapping-rule\":[],\"expression-filter\":[],\"black-white-list\":null,\"block-allow-list\":{\"do-tables\":null,\"do-dbs\":[\"dm_tolerance_worker_down\"],\"ignore-tables\":null,\"ignore-dbs\":null},\"mydumper-path\":\"./bin/mydumper\",\"threads\":4,\"chunk-filesize\":\"64\",\"statement-size\":0,\"rows\":0,\"where\":\"\",\"skip-tz-utc\":true,\"extra-args\":\"\",\"pool-size\":16,\"dir\":\"./dumped_data.test\",\"import-mode\":\"logical\",\"on-duplicate\":\"replace\",\"meta-file\":\"\",\"worker-count\":16,\"batch\":100,\"queue-size\":1024,\"checkpoint-flush-interval\":30,\"compact\":false,\"multiple-rows\":false,\"max-retry\":0,\"auto-fix-gtid\":false,\"enable-gtid\":true,\"disable-detect\":false,\"safe-mode\":false,\"safe-mode-duration\":\"60s\",\"enable-ansi-quotes\":false,\"ValidatorCfg\":{\"mode\":\"none\",\"worker-count\":4,\"validate-interval\":{\"Duration\":\"10s\"},\"check-interval\":{\"Duration\":\"5s\"},\"row-error-delay\":{\"Duration\":\"30m0s\"},\"meta-flush-interval\":{\"Duration\":\"5m0s\"},\"batch-query-size\":100,\"max-pending-row-size\":\"500m\",\"max-pending-row-count\":2147483647},\"log-level\":\"\",\"log-file\":\"\",\"log-format\":\"\",\"log-rotate\":\"\",\"pprof-addr\":\"\",\"status-addr\":\"\",\"config-file\":\"\",\"clean-dump-file\":true,\"ansi-quotes\":false,\"experimental\":{\"async-checkpoint-flush\":false}}"]
[2022/11/21 03:07:13.589 +00:00] [INFO] [dumpling.go:431] ["found upstream SQL mode"] [task=test] [unit=dump] ["SQL mode"=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION]
[2022/11/21 03:07:13.590 +00:00] [INFO] [dumpling.go:109] ["create dumpling"] [task=test] [unit=dump] [config="{\"s3\":{\"endpoint\":\"\",\"region\":\"\",\"storage-class\":\"\",\"sse\":\"\",\"sse-kms-key-id\":\"\",\"acl\":\"\",\"access-key\":\"\",\"secret-access-key\":\"\",\"provider\":\"\",\"force-path-style\":false,\"use-accelerate-endpoint\":false,\"role-arn\":\"\",\"external-id\":\"\",\"object-lock-enabled\":false},\"gcs\":{\"endpoint\":\"\",\"storage-class\":\"\",\"predefined-acl\":\"\",\"credentials-file\":\"\"},\"azblob\":{\"endpoint\":\"\",\"account-name\":\"\",\"account-key\":\"\",\"access-tier\":\"\"},\"AllowCleartextPasswords\":false,\"SortByPk\":true,\"NoViews\":true,\"NoSequences\":true,\"NoHeader\":false,\"NoSchemas\":false,\"NoData\":false,\"CompleteInsert\":true,\"TransactionalConsistency\":true,\"EscapeBackslash\":true,\"DumpEmptyDatabase\":true,\"PosAfterConnect\":false,\"CompressType\":0,\"Host\":\"upstream1\",\"Port\":3306,\"Threads\":4,\"User\":\"root\",\"Security\":{\"CAPath\":\"\",\"CertPath\":\"\",\"KeyPath\":\"\"},\"LogLevel\":\"\",\"LogFile\":\"\",\"LogFormat\":\"\",\"OutputDirPath\":\"./dumped_data.test\",\"StatusAddr\":\"\",\"Snapshot\":\"\",\"Consistency\":\"auto\",\"CsvNullValue\":\"\\\\N\",\"SQL\":\"\",\"CsvSeparator\":\"\",\"CsvDelimiter\":\"\",\"Databases\":null,\"Where\":\"\",\"FileType\":\"\",\"ServerInfo\":{\"ServerType\":0,\"ServerVersion\":null,\"HasTiKV\":false},\"Rows\":200000,\"ReadTimeout\":0,\"TiDBMemQuotaQuery\":0,\"FileSize\":67108864,\"StatementSize\":1000000,\"SessionParams\":{\"time_zone\":\"+00:00\"},\"Tables\":null,\"CollationCompatible\":\"loose\"}"]
[2022/11/21 03:07:13.611 +00:00] [INFO] [util.go:134] ["use timezone"] [task=test] [unit="binlog replication"] [location=+00:00] [host=upstream1] [port=3306]
[2022/11/21 03:07:13.616 +00:00] [INFO] [util.go:134] ["use timezone"] [task=test] [unit="binlog replication"] [location=+00:00] [host=downstream-tidb] [port=4000]
[2022/11/21 03:07:13.681 +00:00] [INFO] [version.go:398] ["detect server version"] [type=MySQL] [version=5.7.36-log]
[2022/11/21 03:07:13.684 +00:00] [INFO] [version.go:398] ["detect server version"] [type=TiDB] [version=6.5.0-alpha]
[2022/11/21 03:07:13.697 +00:00] [INFO] [checkpoint.go:988] ["create checkpoint schema"] [task=test] [unit="binlog replication"] [component="remote checkpoint"] [statement="CREATE SCHEMA IF NOT EXISTS `dm_meta`"]
[2022/11/21 03:07:13.699 +00:00] [INFO] [checkpoint.go:1012] ["create checkpoint table"] [task=test] [unit="binlog replication"] [component="remote checkpoint"] [statements="[\"CREATE TABLE IF NOT EXISTS `dm_meta`.`test_syncer_checkpoint` (\\n\\t\\t\\tid VARCHAR(32) NOT NULL,\\n\\t\\t\\tcp_schema VARCHAR(128) NOT NULL,\\n\\t\\t\\tcp_table VARCHAR(128) NOT NULL,\\n\\t\\t\\tbinlog_name VARCHAR(128),\\n\\t\\t\\tbinlog_pos INT UNSIGNED,\\n\\t\\t\\tbinlog_gtid TEXT,\\n\\t\\t\\texit_safe_binlog_name VARCHAR(128) DEFAULT '',\\n\\t\\t\\texit_safe_binlog_pos INT UNSIGNED DEFAULT 0,\\n\\t\\t\\texit_safe_binlog_gtid TEXT,\\n\\t\\t\\ttable_info JSON NOT NULL,\\n\\t\\t\\tis_global BOOLEAN,\\n\\t\\t\\tcreate_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,\\n\\t\\t\\tupdate_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\\n\\t\\t\\tUNIQUE KEY uk_id_schema_table (id, cp_schema, cp_table)\\n\\t\\t)\"]"]
[2022/11/21 03:07:13.701 +00:00] [INFO] [checkpoint.go:1074] ["fetch global checkpoint from DB"] [task=test] [unit="binlog replication"] [component="remote checkpoint"] ["global checkpoint"="location(position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222) (flushed location(position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222))"]
[2022/11/21 03:07:13.704 +00:00] [INFO] [subtask.go:195] ["continue unit"] [subtask=test] [unit=Sync]
[2022/11/21 03:07:13.704 +00:00] [INFO] [subtask.go:248] ["start to run"] [subtask=test] [unit=Sync]
[2022/11/21 03:07:13.704 +00:00] [INFO] [source_worker.go:493] ["handling subtask enabled"] [component="worker controller"]
[2022/11/21 03:07:13.706 +00:00] [INFO] [syncer.go:1810] ["replicate binlog from checkpoint"] [task=test] [unit="binlog replication"] [checkpoint="position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222"]
[2022/11/21 03:07:13.714 +00:00] [INFO] [streamer_controller.go:70] ["last slave connection"] [task=test] [unit="binlog replication"] ["connection ID"=101]
[2022/11/21 03:07:13.714 +00:00] [INFO] [mode.go:100] ["change count"] [task=test] [unit="binlog replication"] ["previous count"=0] ["new count"=0]
[2022/11/21 03:07:13.714 +00:00] [INFO] [safe_mode.go:96] ["enable safe-mode because of task initialization"] [task=test] [unit="binlog replication"] [duration=1m0s]
[2022/11/21 03:07:13.714 +00:00] [INFO] [mode.go:100] ["change count"] [task=test] [unit="binlog replication"] ["previous count"=0] ["new count"=1]
[2022/11/21 03:07:13.828 +00:00] [INFO] [syncer.go:2423] [task=test] [unit="binlog replication"] [event=rotate] [location="position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222"]
[2022/11/21 03:07:13.828 +00:00] [INFO] [syncer.go:2307] ["meet heartbeat event and then flush jobs"] [task=test] [unit="binlog replication"]
[2022/11/21 03:07:13.828 +00:00] [INFO] [syncer.go:3042] ["flush all jobs"] [task=test] [unit="binlog replication"] ["global checkpoint"="location(position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222) (flushed location(position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222))"] ["flush job seq"=1]
[2022/11/21 03:07:13.828 +00:00] [INFO] [checkpoint_flush_worker.go:67] ["about to sync flush checkpoint snapshot"] [task=test] [unit="binlog replication"] [snapshot_id=1]
[2022/11/21 03:07:13.844 +00:00] [INFO] [checkpoint_flush_worker.go:108] ["sync flush checkpoint snapshot successfully"] [task=test] [unit="binlog replication"] [snapshot_id=1] [pos="position: (mysql-bin.000001, 116319791), gtid-set: adade2e6-693e-11ed-a584-e65bc66ff355:1-65222"]
[2022/11/21 03:07:13.844 +00:00] [INFO] [syncer.go:1284] ["after sync flushed checkpoint, gc all causality keys"] [task=test] [unit="binlog replication"]
[2022/11/21 03:07:13.844 +00:00] [INFO] [tracker.go:410] ["Downstream schema tracker init. "] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest2`]
[2022/11/21 03:07:13.846 +00:00] [INFO] [tracker.go:460] ["Show create table info"] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest2`] ["create string"="CREATE TABLE `sbtest2` (\n  `id` int(11) NOT NULL AUTO_INCREMENT,\n  `k` int(11) NOT NULL DEFAULT '0',\n  `c` char(120) NOT NULL DEFAULT '',\n  `pad` char(60) NOT NULL DEFAULT '',\n  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */\n) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin AUTO_INCREMENT=30040"]
[2022/11/21 03:07:13.847 +00:00] [INFO] [tracker.go:410] ["Downstream schema tracker init. "] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest3`]
[2022/11/21 03:07:13.848 +00:00] [INFO] [tracker.go:460] ["Show create table info"] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest3`] ["create string"="CREATE TABLE `sbtest3` (\n  `id` int(11) NOT NULL AUTO_INCREMENT,\n  `k` int(11) NOT NULL DEFAULT '0',\n  `c` char(120) NOT NULL DEFAULT '',\n  `pad` char(60) NOT NULL DEFAULT '',\n  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */\n) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin AUTO_INCREMENT=30030"]
[2022/11/21 03:07:13.849 +00:00] [INFO] [tracker.go:410] ["Downstream schema tracker init. "] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest4`]
[2022/11/21 03:07:13.850 +00:00] [INFO] [tracker.go:460] ["Show create table info"] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest4`] ["create string"="CREATE TABLE `sbtest4` (\n  `id` int(11) NOT NULL AUTO_INCREMENT,\n  `k` int(11) NOT NULL DEFAULT '0',\n  `c` char(120) NOT NULL DEFAULT '',\n  `pad` char(60) NOT NULL DEFAULT '',\n  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */\n) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin AUTO_INCREMENT=30009"]
[2022/11/21 03:07:13.850 +00:00] [INFO] [tracker.go:410] ["Downstream schema tracker init. "] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest1`]
[2022/11/21 03:07:13.851 +00:00] [INFO] [tracker.go:460] ["Show create table info"] [task=test] [unit="binlog replication"] [tableID=`dm_tolerance_worker_down`.`sbtest1`] ["create string"="CREATE TABLE `sbtest1` (\n  `id` int(11) NOT NULL AUTO_INCREMENT,\n  `k` int(11) NOT NULL DEFAULT '0',\n  `c` char(120) NOT NULL DEFAULT '',\n  `pad` char(60) NOT NULL DEFAULT '',\n  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */\n) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin AUTO_INCREMENT=30010"]
[2022/11/21 03:07:19.962 +00:00] [WARN] [db.go:258] ["execute statements"] [task=test] [unit="binlog replication"] [retry=0] [queries="[DELETE FROM `dm_tolerance_worker_down`.`sbtest4` WHERE `id` = ? LIMIT 1 REPLACE INTO `dm_tolerance_worker_down`.`sbtest4` (`id`,`k`,`c`,`pad`) VALUES (?,?,?,?) DELETE FROM `dm_tolerance_worker_down`.`sbtest4` WHERE `id` = ? LIMIT 1 REPLACE INTO `dm_tolerance_worker_down`.`sbtest4` (`id`,`k`,`c`,`pad`) VALUES (?,?,?,?) DELETE FROM `dm_tolerance_worker_down`.`sbtest3` WHERE `id` = ? LIMIT 1 REPLACE INTO `dm_tolerance_worker_down`.`sbtest3` (`id`,`k`,`c`,`pad`) VALUES (?,?,?,?) DELETE FROM `dm_tolerance_worker_down`.`sbtest1` WHERE `id` = ? LIMIT 1 REPLACE INTO `dm_tolerance_worker_down`.`sbtest1` (`id`,`k`,`c`,`pad`) VALUES (?,?,?,?) DELETE FROM `dm_tolerance_worker_down`.`sbtest3` WHERE `id` = ? LIMIT 1 REPLACE INTO `dm_tolerance_worker_down`.`sbtest3` (`id`,`k`,`c`,`pad`) VALUES (?,?,?,?) DELETE FROM `dm_tolerance_worker_down`.`sbtest1` WHERE `id` = ? LIMIT 1 REPLACE INTO `dm_tolerance_worker_down`.`sbtest1` (`id`,`k`,`c`,`pad`) VALUES (?,?,?,?) DELETE FROM `dm_tolerance_worker_down`.`sbtest3` WHERE `id` = ?..."] [arguments="[[503] [503 526 58786060075-95409685922-18517688425-12257927450-53687469629-17929040754-88613745258-05968530211-68280356218-15157441668 30342124295-66690851291-59257188045-54542361399-61720292180] [505] [505 561 17327246913-08233580237-97733343472-72244227849-72680056569-30787449458-68674684651-72896889262-75129348175-52989660929 83147043244-62891648353-02121542150-39643842600-11258811097] [498] [498 502 57701680960-24583540323-24728932586-87885382640-13942770135-15230681132-57071783738-94446482604-69117141090-67263816561 84350464043-19221405381-28646133494-98101667059-43287132477] [504] [504 503 90986047077-17348227668-78430829666-46139479021-34799418128-81966019054-06127647911-00171017539-26955543235-98850939366 30095641855-05061367906-19328438425-30671781404-76389826436] [498] [498 504 23454890211-94055686133-60843817596-79184556142-40577877203-00806711777-94462462598-10305821744-34489818648-19252192560 41634193446-38076708085-46529644781-37606692112-01270295275] [501] [501 500 57467156075-92467459896-1523..."] [error="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: Error 9007: Write conflict, txnStartTS=437518346450108480, conflictStartTS=437518346450108479, conflictCommitTS=437518346463477820, key={tableID=89, tableName=dm_tolerance_worker_down.sbtest4, handle=500}, originalKey=7480000000000000595f7280000000000001f4, primary={tableID=87, tableName=dm_tolerance_worker_down.sbtest1, handle=439}, originalPrimaryKey=7480000000000000575f7280000000000001b7, reason=Optimistic [try again later]"]
[2022/11/21 03:07:19.962 +00:00] [WARN] [strategy.go:94] ["retry stratey takes effect"] [error="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: Error 9007: Write conflict, txnStartTS=437518346450108480, conflictStartTS=437518346450108479, conflictCommitTS=437518346463477820, key={tableID=89, tableName=dm_tolerance_worker_down.sbtest4, handle=500}, originalKey=7480000000000000595f7280000000000001f4, primary={tableID=87, tableName=dm_tolerance_worker_down.sbtest1, handle=439}, originalPrimaryKey=7480000000000000575f7280000000000001b7, reason=Optimistic [try again later]"] [errorVerbose="[code=10006:class=database:scope=not-set:level=high], Message: execute statement failed: commit, RawCause: Error 9007: Write conflict, txnStartTS=437518346450108480, conflictStartTS=437518346450108479, conflictCommitTS=437518346463477820, key={tableID=89, tableName=dm_tolerance_worker_down.sbtest4, handle=500}, originalKey=7480000000000000595f7280000000000001f4, primary={tableID=87, tableName=dm_tolerance_worker_down.sbtest1, handle=439}, originalPrimaryKey=7480000000000000575f7280000000000001b7, reason=Optimistic [try again later]\ngithub.com/pingcap/tiflow/dm/pkg/terror.(*Error).Delegate\n\tgithub.com/pingcap/tiflow/dm/pkg/terror/terror.go:324\ngithub.com/pingcap/tiflow/dm/pkg/conn.(*BaseConn).ExecuteSQLWithIgnoreError\n\tgithub.com/pingcap/tiflow/dm/pkg/conn/baseconn.go:211\ngithub.com/pingcap/tiflow/dm/syncer/dbconn.(*DBConn).ExecuteSQLWithIgnore.func2\n\tgithub.com/pingcap/tiflow/dm/syncer/dbconn/db.go:181\ngithub.com/pingcap/tiflow/dm/pkg/retry.(*FiniteRetryStrategy).Apply\n\tgithub.com/pingcap/tiflow/dm/pkg/retry/strategy.go:84\ngithub.com/pingcap/tiflow/dm/pkg/conn.(*BaseConn).ApplyRetryStrategy\n\tgithub.com/pingcap/tiflow/dm/pkg/conn/baseconn.go:259\ngithub.com/pingcap/tiflow/dm/syncer/dbconn.(*DBConn).ExecuteSQLWithIgnore\n\tgithub.com/pingcap/tiflow/dm/syncer/dbconn/db.go:172\ngithub.com/pingcap/tiflow/dm/syncer/dbconn.(*DBConn).ExecuteSQL\n\tgithub.com/pingcap/tiflow/dm/syncer/dbconn/db.go:220\ngithub.com/pingcap/tiflow/dm/syncer.(*DMLWorker).executeBatchJobs\n\tgithub.com/pingcap/tiflow/dm/syncer/dml_worker.go:259\ngithub.com/pingcap/tiflow/dm/syncer.(*DMLWorker).executeJobs\n\tgithub.com/pingcap/tiflow/dm/syncer/dml_worker.go:183\nruntime.goexit\n\truntime/asm_amd64.s:1594"] [retry_times=0] [retry_count=100]
GMHDBJD commented 1 year ago

ref #3737

GMHDBJD commented 1 year ago

https://github.com/pingcap/tiflow/blob/e6f969669a8179a36a2eade5669d356dc207de80/dm/worker/server.go#L470 doclose will cancel the context for keepalive https://github.com/pingcap/tiflow/blob/e6f969669a8179a36a2eade5669d356dc207de80/dm/worker/server.go#L495

fubinzh commented 1 year ago

Below error when seen sometimes when running the same test case:

2022-11-19T20:04:39.188Z    INFO    k8s/dmcluster.go:54 exec on dm result{podName 15 0 dm-dm-master-0 <nil>} {containerName 15 0 dm-master <nil>} {cmd 1 0  [/bin/sh -c /dmctl --master-addr=127.0.0.1:8261 query-status -s source-0 test]} {stdout 15 0 {
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "source-0",
                "worker": "dm-dm-worker-1",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
                {
                    "name": "test",
                    "stage": "Paused",
                    "unit": "Sync",
                    "result": {
                        "isCanceled": false,
                        "errors": [
                            {
                                "ErrCode": 10006,
                                "ErrClass": "database",
                                "ErrScope": "not-set",
                                "ErrLevel": "high",
                                "Message": "startLocation: [position: (mysql-bin.000001, 700411981), gtid-set: cfc901b1-6828-11ed-8523-e6c1c1f7f2dd:1-394846], endLocation: [position: (mysql-bin.000001, 700412395), gtid-set: cfc901b1-6828-11ed-8523-e6c1c1f7f2dd:1-394846]: execute statement failed: commit",
                                "RawCause": "Error 1062: Duplicate entry '505' for key 'sbtest4.PRIMARY'",
                                "Workaround": ""
                            }
                        ],
                        "detail": null
                    },
                    "unresolvedDDLLockID": "",
                    "sync": {
                        "totalEvents": "1403142",
                        "totalTps": "1648",
                        "recentTps": "0",
                        "masterBinlog": "(mysql-bin.000002, 286970278)",
                        "masterBinlogGtid": "cfc901b1-6828-11ed-8523-e6c1c1f7f2dd:1-767476",
                        "syncerBinlog": "(mysql-bin.000001, 650694743)",
                        "syncerBinlogGtid": "cfc901b1-6828-11ed-8523-e6c1c1f7f2dd:1-366788",
                        "blockingDDLs": [
                        ],
                        "unresolvedGroups": [
                        ],
                        "synced": false,
                        "binlogType": "remote",
                        "secondsBehindMaster": "114",
                        "blockDDLOwner": "",
                        "conflictMsg": "",
                        "totalRows": "1403142",
                        "totalRps": "1648",
                        "recentRps": "0"
                    },
                    "validation": null
                }
            ]
        }
    ]
} <nil>} {stderr 15 0  <nil>}
fubinzh commented 1 year ago

/found automation

fubinzh commented 1 year ago

/severity critical

fubinzh commented 1 year ago

/label affects-5.4

fubinzh commented 1 year ago

/label affects-6.1