CDC cloud: unified sorter IO error: too many open files

Tammyxia commented 3 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error.

scale-in 4 tikv, workload is go-ycsb, 1 changefeed, 6w tables.
owner switch, then all pd restart
check changefeed status.

What did you expect to see?
What did you see instead?

$ /cdc cli --ca=/var/lib/ticdc-tls/ca.crt --cert=/var/lib/ticdc-tls/tls.crt --key=/var/lib/ticdc-tls/tls.key changefeed list --pd=https://db-pd:2379

[ { "id": "replication-task-60063", "summary": { "state": "normal", "tso": 427405617999380761, "checkpoint": "2021-08-31 15:17:22.295", "error": { "addr": "db-ticdc-2.db-ticdc-peer.tidb60105.svc:8301", "code": "CDC:ErrUnifiedSorterIOError", "message": "[CDC:ErrUnifiedSorterIOError]unified sorter IO error. Make sure your sort-dir is configured correctly by passing a valid argument or toml file to cdc server, or if you use TiUP, review the settings in tiup cluster edit-config. Details: open /var/lib/ticdc/tmp/sorter/sort-1-3187215.tmp: too many open files" } } } ] cdc log:

Versions of the cluster
- Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):
```
4.0.14
```
- TiCDC version (execute cdc version):
```
4.0.14
```

liuzix commented 3 years ago

This problem has been mitigated by lowering the number of concurrent sorting heaps and increasing the interval at which heaps are written to disk. Please confirm if the problem persisted in the latest cloud-cdc version.

overvenus commented 3 years ago

It relates to https://github.com/pingcap/ticdc/issues/2793

liuzix commented 2 years ago

This problem has been solved by dbsorter, which is enabled by default from 6.0 onward. It is also available on 5.4.1, which satisfies the needs of the cloud. So I'm closing this issue.

pingcap / tiflow

CDC cloud: unified sorter IO error: too many open files #2698

Bug Report