pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
417 stars 272 forks source link

sourcemanager stucks when closing processor and cdc disk is full. #10612

Open fubinzh opened 4 months ago

fubinzh commented 4 months ago

What did you do?

This issue is seen when CDC incremental scan not finished in 24 hours, CDC changefeed into error state, and cdc tries to close processor when cdc disk is full.

2 @ 0x43f14e 0x46ec59 0x46ec39 0x48fe05 0x26631fe 0x265d00c 0x263af10 0x263a959 0x265ca8f 0x38e1d5b 0x38e1d40 0x38de4b0 0x472e61
#   0x46ec38    sync.runtime_notifyListWait+0x138                                   runtime/sema.go:527
#   0x48fe04    sync.(*Cond).Wait+0x84                                          sync/cond.go:70
#   0x26631fd   [github.com/cockroachdb/pebble.(*DB).makeRoomForWrite+0x27d](http://github.com/cockroachdb/pebble.(*DB).makeRoomForWrite+0x27d)                      [github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/db.go:1704](http://github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/db.go:1704)
#   0x265d00b   [github.com/cockroachdb/pebble.(*DB).commitWrite+0x16b](http://github.com/cockroachdb/pebble.(*DB).commitWrite+0x16b)                           [github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/db.go:818](http://github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/db.go:818)
#   0x263af0f   [github.com/cockroachdb/pebble.(*commitPipeline).prepare+0x12f](http://github.com/cockroachdb/pebble.(*commitPipeline).prepare+0x12f)                       [github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/commit.go:379](http://github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/commit.go:379)
#   0x263a958   [github.com/cockroachdb/pebble.(*commitPipeline).Commit+0x58](http://github.com/cockroachdb/pebble.(*commitPipeline).Commit+0x58)                     [github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/commit.go:253](http://github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/commit.go:253)
#   0x265ca8e   [github.com/cockroachdb/pebble.(*DB).Apply+0x1ce](http://github.com/cockroachdb/pebble.(*DB).Apply+0x1ce)                             [github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/db.go:746](http://github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/db.go:746)
#   0x38e1d5a   [github.com/cockroachdb/pebble.(*Batch).Commit+0x55a](http://github.com/cockroachdb/pebble.(*Batch).Commit+0x55a)                         [github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/batch.go:905](http://github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/batch.go:905)
#   0x38e1d3f   [github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble.(*EventSorter).handleEvents+0x53f](http://github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble.(*EventSorter).handleEvents+0x53f)   [github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble/event_sorter.go:463](http://github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble/event_sorter.go:463)
#   0x38de4af   [github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble.New.func1+0x8f](http://github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble.New.func1+0x8f)          [github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble/event_sorter.go:95](http://github.com/pingcap/tiflow/cdc/processor/sourcemanager/sorter/pebble/event_sorter.go:95)

What did you expect to see?

sourcemanager should not stuck

What did you see instead?

sourcemanager stucks

Versions of the cluster

Release Version: v8.0.0-alpha
Git Commit Hash: 13eb7a30cc06cfef5587a85753bafc77b685aa2e
Git Branch: heads/refs/tags/v8.0.0-alpha
UTC Build Time: 2024-02-01 11:36:27
Go Version: go version go1.21.5 linux/amd64
Failpoint Build: false
fubinzh commented 4 months ago

/severity moderate