Background:
When JMX exporter is overloaded (say 20-30 qps), we have observed that some of the requests take more than 20 sec to serve which was higher than the client-side request timeout. As a result, the agent tried to send response on connections already closed, which has two consequences: (a) it resulted errors and (b) it sometimes led to the socket channel being blocked indefinitely on the write syscall. Eventually, all threads of the HTTP server in the Prometheus agent get stuck and no more requests can be accepted. However, the thread accepting connections is still active and new connections are created but never actually used, and since all request threads of the HTTP server are stuck, the connections are never closed by the server, resulting in a long backlog of CLOSE_WAIT sockets waiting to be closed.
Proposed Solution1. Limit connections
We want to limit the number of connection to the exporter agents. There is no native way for jmx exporter to put such restrictions(To over come this we will be adding ip table rules for jmx port).
2. Adding timeouts to requestes
This could be easily achieved by JVM settings. But it would be nice to add these in jmx-exporter's documentation.
-Dsun.net.httpserver.maxReqTime=20 -Dsun.net.httpserver.maxRspTime=20
Please let me know if this idea makes sense to community. I can work on design.
@brian-brazil
Background: When JMX exporter is overloaded (say 20-30 qps), we have observed that some of the requests take more than 20 sec to serve which was higher than the client-side request timeout. As a result, the agent tried to send response on connections already closed, which has two consequences: (a) it resulted errors and (b) it sometimes led to the socket channel being blocked indefinitely on the write syscall. Eventually, all threads of the HTTP server in the Prometheus agent get stuck and no more requests can be accepted. However, the thread accepting connections is still active and new connections are created but never actually used, and since all request threads of the HTTP server are stuck, the connections are never closed by the server, resulting in a long backlog of CLOSE_WAIT sockets waiting to be closed.
Proposed Solution 1. Limit connections We want to limit the number of connection to the exporter agents. There is no native way for jmx exporter to put such restrictions(To over come this we will be adding ip table rules for jmx port).
2. Adding timeouts to requestes This could be easily achieved by JVM settings. But it would be nice to add these in jmx-exporter's documentation. -Dsun.net.httpserver.maxReqTime=20 -Dsun.net.httpserver.maxRspTime=20
Please let me know if this idea makes sense to community. I can work on design.