influxdata / kapacitor

Open source framework for processing, monitoring, and alerting on time series data
MIT License
2.31k stars 492 forks source link

Kapacitor flooding alarms and too many open files #2134

Open StefanSa opened 5 years ago

StefanSa commented 5 years ago

Hi there. We had a strange behavior with Kapacitor last night and this morning. All servers were incorrectly recognized as offline by the deadman node and to the 300 alarms in various forms (CPU, disk memory, etc.).

TICK Stack = latest version OS = Centos 7.6 Hardware = PowerEdge 710 // 16Core // 76 GB RAM

What is striking is the following: During this period, a tcp_close_wait of 15K is reached ! tcp_wait

and an increase in RAM consumption from 8% to 22% ram

In addition one finds several error message in the kapacitor log: ts=2018-12-12T08:25:51.043+01:00 lvl=error msg="2018/12/12 08:25:51 http: Accept error: accept tcp [::]:9092: accept4: too many open files; retrying i n 20ms\n" service=http service=httpd_server_errors

and in influxdb.log a lot of this: influxd: ts=2018-12-12T02:18:30.619828Z lvl=info msg="Post http://localhost:9092/write?consistency=&db=telegraf&precision=n s&rp=autogen: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" log_id=0CH~4qg0000 service=subscriber

Any idea what was going on here?

huyujie commented 5 years ago

I also encountered this problem.

hemantjadon commented 5 years ago

+1

DeepanshGarg commented 4 years ago

+1