pingcap / dm

Data Migration Platform
Apache License 2.0
456 stars 188 forks source link

DM failed to start an incremental task #2210

Open together-wang opened 3 years ago

together-wang commented 3 years ago

What did you do?

The incremental task gets stuck when it starts, and then fails over time., but it is normal to start full tasks. Configure incremental tasks and start tasks.

[tidb@node127 dm]$ cat incr.yaml 
---
name: incr
task-mode: incremental #all   # all 即先进行全量同步,全量同步完成后自动进行增量数据同步
target-database:
  host: "172.16.5.215"
  port: 4200
  user: "root"
  password: ""

mysql-instances:       
  -
    source-id: "mysql-replica-01"
    meta:
      binlog-name: mysql-bin.000028
      binlog-pos: 1365415
      binlog-gtid: "6ddb3c98-5637-11eb-baa7-1adaa2de7319:1-70819"
    block-allow-list:  "instance"
    loader-thread: 16
    syncer-thread: 16000

block-allow-list:
  instance:
    do-dbs: ["test2"]
[tidb@node127 dm]$ tiup dmctl  --master-addr 172.16.5.162:8561 start-task ./incr.yaml 
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 start-task ./incr.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38032:class=dm-master:scope=internal:level=high], Message: some error occurs in dm-worker: ErrCode:44012 ErrClass:\"schema-tracker\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"fail to initial unit Sync of subtask incr : failed to create schema tracker\" RawCause:\"context deadline exceeded\" , Workaround: Please execute `query-status` to check status.",
            "source": "mysql-replica-01",
            "worker": "dm-172.16.5.162-8262"
        }
    ]
}

What did you expect to see?

Incremental task started successfully

What did you see instead?

[tidb@node127 dm]$ tiup dmctl  --master-addr 172.16.5.162:8561 start-task ./incr.yaml 
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 start-task ./incr.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38032:class=dm-master:scope=internal:level=high], Message: some error occurs in dm-worker: ErrCode:44012 ErrClass:\"schema-tracker\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"fail to initial unit Sync of subtask incr : failed to create schema tracker\" RawCause:\"context deadline exceeded\" , Workaround: Please execute `query-status` to check status.",
            "source": "mysql-replica-01",
            "worker": "dm-172.16.5.162-8262"
        }
    ]
}

Versions of the cluster

[tidb@node127 dm]$ tiup dm display dm1
Starting component `dm`: /home/tidb/.tiup/components/dm/v1.6.0/tiup-dm display dm1
Cluster type:       dm
Cluster name:       dm1
Cluster version:    v2.0.7
Deploy user:        tidb
SSH type:           builtin
ID                 Role          Host          Ports      OS/Arch       Status     Data Dir                          Deploy Dir
--                 ----          ----          -----      -------       ------     --------                          ----------
172.16.5.162:9093  alertmanager  172.16.5.162  9093/9094  linux/x86_64  Up         /home/dm1-data/alertmanager-9093  /home/dm1-deploy/alertmanager-9093
172.16.5.162:8561  dm-master     172.16.5.162  8561/8591  linux/x86_64  Healthy|L  /home/dm1-data/dm-master-8561     /home/dm1-deploy/dm-master-8561
172.16.5.162:8262  dm-worker     172.16.5.162  8262       linux/x86_64  Bound      /home/dm1-data/dm-worker-8262     /home/dm1-deploy/dm-worker-8262
172.16.5.162:3330  grafana       172.16.5.162  3330       linux/x86_64  Up         -                                 /home/dm1-deploy/grafana-3330
172.16.5.162:9390  prometheus    172.16.5.162  9390       linux/x86_64  Up         /home/dm1-data/prometheus-9390    /home/dm1-deploy/prometheus-9390
Total nodes: 5

current status of DM cluster (execute query-status in dmctl)

[tidb@node127 dm]$ tiup dmctl  --master-addr 172.16.5.162:8561 query-status full1
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 query-status full1
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "dm-172.16.5.162-8262",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
                {
                    "name": "full1",
                    "stage": "Finished",
                    "unit": "Load",
                    "result": {
                        "isCanceled": false,
                        "errors": [
                        ],
                        "detail": null
                    },
                    "unresolvedDDLLockID": "",
                    "load": {
                        "finishedBytes": "79",
                        "totalBytes": "79",
                        "progress": "100.00 %",
                        "metaBinlog": "(mysql-bin.000028, 1366178)",
                        "metaBinlogGTID": "6ddb3c98-5637-11eb-baa7-1adaa2de7319:1-70823"
                    }
                }
            ]
        }
    ]
}
[tidb@node127 dm]$ time tiup dmctl  --master-addr 172.16.5.162:8561 query-status incr
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 query-status incr
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38008:class=dm-master:scope=internal:level=high], Message: grpc request error, RawCause: rpc error: code = DeadlineExceeded desc = context deadline exceeded",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
            ]
        }
    ]
}

real    0m30.181s
user    0m0.161s
sys     0m0.085s
lance6716 commented 3 years ago

don't know why schema tracker is stuck. could you provide a goroutine stack of DM worker?

together-wang commented 3 years ago

归档.zip dm-worker and dm-master profile

[tidb@node127 dm]$ time  tiup dmctl  --master-addr 172.16.5.162:8561 start-task ./incr.yaml 
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 start-task ./incr.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38032:class=dm-master:scope=internal:level=high], Message: some error occurs in dm-worker: ErrCode:44012 ErrClass:\"schema-tracker\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"fail to initial unit Sync of subtask incr1 : failed to create schema tracker\" RawCause:\"context deadline exceeded\" , Workaround: Please execute `query-status` to check status.",
            "source": "mysql-replica-01",
            "worker": "dm-172.16.5.162-8262"
        }
    ]
}

real    2m10.356s
user    0m0.185s
sys     0m0.082s
lichunzhu commented 3 years ago

Could you please help grab a dm goroutine text using curl http://127.0.0.1:<worker-port>/debug/pprof/goroutine\?debug\=2?