Open mattwwarren opened 5 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Please do not close this issue. This is still a weekly problem for us.
@mattwwarren We hit the same issue with you.
Looks like Influxdb does not enable TCP-Keepalive, and Linux keeps the leaked established TCP connections as ESTABLISHED state.
In our case, we have telegraf running on a remote server. The network link is lossy between telegraf server and influxdb server.
When telegraf decide to close tcp connection, the FIN packet might be lost. After that, Influxdb would wait for the http body forever. And the TCP connection is leak.
I'm wondering could influxdb support some kind of timeout mechanism in such scenario.
I ran into the same problem , the following thread helped me https://github.com/influxdata/influxdb/issues/9248
Steps to reproduce: List the minimal actions needed to reproduce the behavior.
lsof -P | grep influx | awk -F':8086' '{print $2}' | awk -F':' '{print $1}' | sort | uniq -c | sort -nk 1
on the influx host to see increasing number of open connections to influx from telegraf agentsExpected behavior: When telegraf shuts down, influx closes open connections
Actual behavior: Influx continues to hold connections open until open file handle limits are reached
Environment info:
Linux 4.14.109-80.92.amzn1.x86_64 x86_64
InfluxDB v1.7.7 (git: 1.7 f8fdf652f348fc9980997fe1c972e2b79ddd13b0)
Config: To my knowledge, we have no custom config settings. I am happy to provide any options if specific values are useful.
Sample lsof output:
Our non-prod hosts shutdown at night, leaving connections open. Prod hosts do not shutdown and their connection counts stay at 1