pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
948 stars 410 forks source link

TiFlash panics with `Too many open files` in the cloud GCP env #9663

Open solotzg opened 4 days ago

solotzg commented 4 days ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

Too many fd under tiflash process. The number of FD keeps growing, causing queries to fail and eventually tiflash to panic. Most of the FDs are related to sockets and a large amount of sockets are still open but can not be found in /proc/net.

sh-5.1# ls -l /proc/1/fd/ | wc -l
295268
sh-5.1# ls -l /proc/1/fd/ | grep "eventfd" | wc -l
98368
sh-5.1# ls -l /proc/1/fd/ | grep "eventpoll" | wc -l
98391
sh-5.1# ls -l /proc/1/fd/ | grep "socket" | wc -l
98492

Other

In the AWS environment, there is no such problem yet.

4. What is your TiFlash version? (Required)

v7.5.3

solotzg commented 23 hours ago

After disabling mpp, the number of socket fd no longer continues to grow. There may be potential bugs in the implementation about mpp.

set global tidb_allow_fallback_to_tikv = "tiflash";
set global tidb_allow_mpp = 0;
set global tidb_allow_tiflash_cop = 1;