pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
430 stars 286 forks source link

Scale-in CDC to 1 node, owner exited and failed to start #4976

Closed Tammyxia closed 2 years ago

Tammyxia commented 2 years ago

What did you do?

What did you expect to see?

No any error

What did you see instead?

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)
TiKV
Release Version:   6.0.0-alpha
Edition:           Community
Git Commit Hash:   9a43faf4da20389a4e2c262ee8ab8b369a3bfec4
Git Commit Branch: heads/refs/tags/v6.1.0-nightly
UTC Build Time:    2022-03-19 11:16:46
Rust Version:      rustc 1.60.0-nightly (1e12aef3f 2022-02-13)
Enable Features:   jemalloc mem-profiling portable sse test-engines-rocksdb cloud-aws cloud-gcp cloud-azure
Profile:           dist_release

TiCDC version (execute cdc version):

(paste TiCDC version here)
Release Version: v5.4.0-master
Git Commit Hash: 8fe91733fc2b92de7c9fed6d5981b44b3d9fc3c3
Git Branch: heads/refs/tags/v6.1.0-nightly
UTC Build Time: 2022-03-21 11:13:22
Go Version: go version go1.18 linux/amd64
Failpoint Build: false
Tammyxia commented 2 years ago

CDC process is still there, but [CDC:ErrOwnerNotFound]owner not found"

Tammyxia commented 2 years ago

goroutines-debug2.tar.gz

overvenus commented 2 years ago

goroutines-debug2.tar.gz

Owner can not be re-elected because owner goroutine is blocked by AsyncClose which is blocked by leveldb sorter close.

goroutine 1068 [select, 214 minutes]:
github.com/pingcap/tiflow/pkg/actor.(*mailbox).SendB(0xc384069ec0, {0x381a080, 0xc000058088}, {0x2, 0x0, {0x0, 0x0, 0x0, {0x0, 0x0}, ...}})
        github.com/pingcap/tiflow/pkg/actor/actor.go:125 +0xdf
github.com/pingcap/tiflow/pkg/actor.(*Router).Broadcast.func1({0xc8e7fb1ad8?, 0x2d838e0?}, {0x2d91e60?, 0xc3a6d34900})
        github.com/pingcap/tiflow/pkg/actor/system.go:292 +0xd9
sync.(*Map).Range(0xc00056cf38?, 0xc8e7fb1b80)
        sync/map.go:347 +0x2aa
github.com/pingcap/tiflow/pkg/actor.(*Router).Broadcast(0xc00056cf30, {0x381a080, 0xc000058088}, {0x2, 0x0, {0x0, 0x0, 0x0, {0x0, 0x0}, ...}})
        github.com/pingcap/tiflow/pkg/actor/system.go:290 +0x12c
github.com/pingcap/tiflow/pkg/actor.(*System).Stop(0xc0003b6380)
        github.com/pingcap/tiflow/pkg/actor/system.go:472 +0x90
github.com/pingcap/tiflow/cdc/sorter/leveldb/system.(*System).Stop(0xc000c3c320)
        github.com/pingcap/tiflow/cdc/sorter/leveldb/system/system.go:200 +0xe5
github.com/pingcap/tiflow/cdc/capture.(*Capture).AsyncClose(0xc001fa4460)
        github.com/pingcap/tiflow/cdc/capture/capture.go:554 +0x19a
github.com/pingcap/tiflow/cdc/capture.(*Capture).run.func2()
        github.com/pingcap/tiflow/cdc/capture/capture.go:300 +0x212
created by github.com/pingcap/tiflow/cdc/capture.(*Capture).run
        github.com/pingcap/tiflow/cdc/capture/capture.go:292 +0x410

https://github.com/pingcap/tiflow/blob/8fe91733fc2b92de7c9fed6d5981b44b3d9fc3c3/cdc/capture/capture.go#L292-L300

Tammyxia commented 2 years ago

Log: https://tcms.pingcap.net/dashboard/executions/plan/662589

sdojjy commented 2 years ago

note: https://pingcap.feishu.cn/docs/doccnOSHoMVEmGat6hNZrCl9nDc#

jebter commented 2 years ago

/found automation

jebter commented 2 years ago

/found automation

jebter commented 2 years ago

/found automation

sdojjy commented 2 years ago

can not reproduce this issue, reopen it if reproduce again