txthinking / brook

A cross-platform programmable network tool
https://brook.app
GNU General Public License v3.0
14.52k stars 2.39k forks source link

Server crushes randomly with 'accept4: too many open files' #350

Closed nietongyu closed 5 years ago

nietongyu commented 6 years ago

Describe actual behavior

accept tcp [::]:7502: accept4: too many open files Brook server stop working, seems like the same situation as this issue but it was closed. https://github.com/txthinking/brook/issues/158

What is your expected behavior

Brook server keep on working.

Specifications like the version of the project, operating system, or hardware

Cent OS 6.9, v20180707, Vultr VPS with BBR open.

Steps to deploy the server

Using the script provided by this guy like many people does. https://doub.io/brook-jc3/

Steps to reproduce the problem

  1. I really don't know, but it has happened quite a few times. Maybe its just keep on using one Brook account and crush will happen. The ports that triggered this error seems to be random.
  2. I can show you some logs.
  3. Logs: Here is part of the crush log I found on my server. I don't want to overwhelm you but I hope to provide you with more information and clues. I deployed Brook on two servers and only this server showed this problem. As far as I know, these two servers are the same except their location and Linux kernel version, this server's kernel version is 4.14.9-1.el6.elrepo.x86_64 and that well functioning one's kernel is 4.17 . Contact me if you want to have access to this server. E-mail: marktnie@gmail.com
    The first time it happened.
    2018/07/07 00:29:00 dial udp 8.8.8.8:53: socket: too many open files
    2018/07/07 00:29:00 dial udp 8.8.8.8:53: socket: too many open files
    2018/07/07 00:29:00 dial udp 8.8.8.8:53: socket: too many open files
    2018/07/07 00:29:00 dial udp 8.8.8.8:53: socket: too many open files
    ^[[32m[信?~A?]^[[0m [2018-07-07 00:29:01 6 CST] Brook?~\~M?~J?端 ?~[?~K?~P?~L正常...
    accept tcp [::]:7502: accept4: too many open files

Second time ^[[32m[信?~A?]^[[0m [2018-07-08 11:43:01 7 CST] Brook?~\~M?~J?端 ?~[?~K?~P?~L正常... ^[[32m[信?~A?]^[[0m [2018-07-08 11:44:01 7 CST] Brook?~\~M?~J?端 ?~[?~K?~P?~L正常... accept tcp [::]:7504: accept4: too many open files ^[[31m[?~T~Y误]^[[0m [2018-07-08 11:45:01 7 CST] ?~@?~K?~H? Brook?~\~M?~J?端 ?~\??~P?~L , ?~@?~K?~P??~J?... ^[[32m[信?~A?]^[[0m [2018-07-08 11:45:05 7 CST] Brook?~\~M?~J?端 ?~P??~J??~H~P?~J~_...

3rd ^[[32m[信?~A?]^[[0m [2018-07-09 14:18:01 1 CST] Brook?~\~M?~J?端 ?~[?~K?~P?~L正常... 2018/07/09 14:18:13 dial tcp: lookup c1.adform.net on 108.61.10.10:53: dial udp 108.61.10.10:53: socket: too many open files accept tcp [::]:7500: accept4: too many open files

4th accept tcp [::]:7501: accept4: too many open files

5th 2018/07/12 12:42:16 dial udp 8.8.8.8:53: socket: too many open files 2018/07/12 12:42:16 dial tcp 37.252.172.27:443: socket: too many open files accept tcp [::]:7501: accept4: too many open files

6th 2018/07/14 11:38:04 dial udp 8.8.8.8:53: socket: too many open files 2018/07/14 11:38:05 dial udp 8.8.8.8:53: socket: too many open files 2018/07/14 11:38:05 dial udp 8.8.8.8:53: socket: too many open files accept tcp [::]:7501: accept4: too many open files

7th accept tcp [::]:7501: accept4: too many open files

8th 2018/07/21 15:22:20 dial tcp: lookup x.bidswitch.net on 108.61.10.10:53: dial udp 108.61.10.10:53: socket: too many open files 2018/07/21 15:22:20 dial tcp: lookup rtb.openx.net on 108.61.10.10:53: no such host accept tcp [::]:7497: accept4: too many open files

9th 2018/08/03 14:38:35 dial tcp 184.168.221.74:443: i/o timeout accept tcp [::]:7509: accept4: too many open files 2018/08/04 10:49:03 dial tcp 8.7.198.45:443: i/o timeout

10th 2018/08/05 03:43:05 dial tcp 66.220.151.20:443: i/o timeout accept tcp [::]:7497: accept4: too many open files 2018/08/05 09:27:45 dial tcp 240.0.0.12:443: i/o timeout

11st 2018/08/09 11:05:50 dial tcp 122.226.84.196:80: i/o timeout accept tcp [::]:7497: accept4: too many open files 2018/08/09 11:33:10 dial tcp 180.149.134.253:443: i/o timeout

silent-x commented 6 years ago

I also met this issue, too. I deployed to 5 servers. 3 in Azure, all have this issue, 2 in some small vps cloud have not. As far as I know, it's not related to the kernel version. I've 3 running under 4.9.0 with BBR, 2 of them has this issue while 1 has not. I've run lsof to see the open files, those server has this issue has a lots (500+) of TCP connections in CLOSE_WAIT state, while those has not this issue just has 50~60 connections.

txthinking commented 6 years ago

https://serverfault.com/questions/48717/practical-maximum-open-file-descriptors-ulimit-n-for-a-high-volume-system