pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
425 stars 282 forks source link

redo log changefeed stucks with "update redo meta with a regressed resolved ts, i gnore" #8601

Closed fubinzh closed 1 year ago

fubinzh commented 1 year ago

What did you do?

  1. create changefeed enble redo log and syncpoint

    enable-sync-point = true
    sync-point-interval = "2m"
    [consistent]
    level = "eventua"
    storage = xxx
    max-log-size = 1
  2. run sysbench workload

    sysbench --db-driver=mysql --mysql-host=`nslookup upstream-tidb.cdc-testbed-tps-1659039-1-693 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g`  --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=50 --table-size=50000 --create_secondary=off --debug=true --threads=10 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_insert prepare
    sysbench --db-driver=mysql --mysql-host=`nslookup upstream-tidb.cdc-testbed-tps-1659039-1-693 | awk -F: '{print $2}' | awk 'NR==5' | sed s/[[:space:]]//g`  --mysql-port=4000 --mysql-user=root --mysql-db=workload --tables=50 --table-size=50000 --create_secondary=off --time=1800 --debug=true --threads=10 --mysql-ignore-errors=2013,1213,1105,1205,8022,8027,8028,9004,9007,1062 oltp_insert run
  3. truncate all tables by execution "truncate table %s"

  4. wait checkpoint advance

What did you expect to see?

  1. checkpoint should advance correctly

What did you see instead?

resolve ts and checkpoint stuck. image

zhang@LAPTOP-U95B63T9 MINGW64 ~/Downloads
$ grep "update redo meta with a regressed resolved ts" ticdc_1659039_0.log | head -n1
[2023/03/19 16:27:34.399 +00:00] [WARN] [meta_manager.go:162] ["update redo meta with a regressed resolved ts, i
gnore"] [currResolvedTs=440203473047781380] [recvResolvedTs=440203472942399491] [namespace=default] [changefeed=
redo-syncpoint-truncate-table-sync]

zhang@LAPTOP-U95B63T9 MINGW64 ~/Downloads
$ grep "update redo meta with a regressed resolved ts" ticdc_1659039_0.log | tail -n1
[2023/03/19 18:56:07.044 +00:00] [WARN] [meta_manager.go:162] ["update redo meta with a regressed resolved ts, i
gnore"] [currResolvedTs=440203473047781380] [recvResolvedTs=440203472942399491] [namespace=default] [changefeed=
redo-syncpoint-truncate-table-sync]

...
[2023/03/19 18:56:05.094 +00:00] [WARN] [meta_manager.go:162] ["update redo meta with a regressed resolved ts, ignore"] [currResolvedTs=440203473047781380] [recvResolvedTs=440203472942399491] [namespace=default] [changefeed=redo-syncpoint-truncate-table-sync]
[2023/03/19 18:56:05.106 +00:00] [WARN] [meta_manager.go:321] ["Redo meta has not changed for a long time, owner may be stuck"] [namespace=default] [changefeed=redo-syncpoint-truncate-table-sync] [lastFlushTime=2h28m32.000224226s] [meta="{\"CheckpointTs\":440203472942399491,\"ResolvedTs\":440203473047781380}"]

Versions of the cluster

TiCDC version (execute cdc version): [2023/03/19 16:16:53.112 +00:00] [INFO] [version.go:47] ["Welcome to Change Data Capture (CDC)"] [release-version=v6.7.0-alpha] [git-hash=41b40aaff95f8754eea7ec792135debe70b8e4da] [git-branch=heads/refs/tags/v6.7.0-alpha] [utc-build-time="2023-03-18 11:32:23"] [go-version="go version go1.20.2 linux/amd64"] [failpoint-build=false]

fubinzh commented 1 year ago

/severity critical

fubinzh commented 1 year ago

/found automation

nongfushanquan commented 1 year ago

/assign @hicqu

hicqu commented 1 year ago
图片

Seems scheduler manager advances resolved timestamp correctly, but owner doesn't.

hicqu commented 1 year ago

Metrics: https://clinic.pingcap.com.cn/grafana/d/YiGL8hBZ1operatorv6_6_x/ticdc-cluster-clinic?orgId=1&var-tidb_cluster=7212289903534786946&var-instance=All&var-store=All&from=1679241934000&to=1679252105000&var-tidb_version=v6.6.x&var-org=31&refresh=15m&var-k8s_cluster=&var-changefeed=redo-syncpoint-truncate-table-sync&var-ticdc_instance=All&var-tikv_instance=All&var-spike_threshold=All&var-runtime_instance=All&var-actor_name=All&var-org_type=

hicqu commented 1 year ago

CDC log: ticdc.log.tar.gz

fubinzh commented 1 year ago

/remove-severity critical

fubinzh commented 1 year ago

/severity major

nongfushanquan commented 1 year ago

/remove-label affects-7.0

fubinzh commented 1 year ago

close it as not reproduced now.