Hi there.
We had a strange behavior with Kapacitor last night and this morning.
All servers were incorrectly recognized as offline by the deadman node and to the 300 alarms in various forms (CPU, disk memory, etc.).
TICK Stack = latest version
OS = Centos 7.6
Hardware = PowerEdge 710 // 16Core // 76 GB RAM
What is striking is the following:
During this period, a tcp_close_wait of 15K is reached !
and an increase in RAM consumption from 8% to 22%
In addition one finds several error message in the kapacitor log:
ts=2018-12-12T08:25:51.043+01:00 lvl=error msg="2018/12/12 08:25:51 http: Accept error: accept tcp [::]:9092: accept4: too many open files; retrying i n 20ms\n" service=http service=httpd_server_errors
and in influxdb.log a lot of this:
influxd: ts=2018-12-12T02:18:30.619828Z lvl=info msg="Post http://localhost:9092/write?consistency=&db=telegraf&precision=n s&rp=autogen: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" log_id=0CH~4qg0000 service=subscriber
Hi there. We had a strange behavior with Kapacitor last night and this morning. All servers were incorrectly recognized as offline by the deadman node and to the 300 alarms in various forms (CPU, disk memory, etc.).
TICK Stack = latest version OS = Centos 7.6 Hardware = PowerEdge 710 // 16Core // 76 GB RAM
What is striking is the following: During this period, a tcp_close_wait of 15K is reached !
and an increase in RAM consumption from 8% to 22%
In addition one finds several error message in the kapacitor log:
ts=2018-12-12T08:25:51.043+01:00 lvl=error msg="2018/12/12 08:25:51 http: Accept error: accept tcp [::]:9092: accept4: too many open files; retrying i n 20ms\n" service=http service=httpd_server_errors
and in influxdb.log a lot of this:
influxd: ts=2018-12-12T02:18:30.619828Z lvl=info msg="Post http://localhost:9092/write?consistency=&db=telegraf&precision=n s&rp=autogen: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" log_id=0CH~4qg0000 service=subscriber
Any idea what was going on here?