prometheus / jmx_exporter

A process for exposing JMX Beans via HTTP for Prometheus consumption
Apache License 2.0
2.97k stars 1.19k forks source link

Rate limiting prometheus jmx-exporter #562

Closed bristy closed 3 years ago

bristy commented 3 years ago

@brian-brazil

Background: When JMX exporter is overloaded (say 20-30 qps), we have observed that some of the requests take more than 20 sec to serve which was higher than the client-side request timeout. As a result, the agent tried to send response on connections already closed, which has two consequences: (a) it resulted errors and (b) it sometimes led to the socket channel being blocked indefinitely on the write syscall. Eventually, all threads of the HTTP server in the Prometheus agent get stuck and no more requests can be accepted. However, the thread accepting connections is still active and new connections are created but never actually used, and since all request threads of the HTTP server are stuck, the connections are never closed by the server, resulting in a long backlog of CLOSE_WAIT sockets waiting to be closed.

Proposed Solution 1. Limit connections We want to limit the number of connection to the exporter agents. There is no native way for jmx exporter to put such restrictions(To over come this we will be adding ip table rules for jmx port).

2. Adding timeouts to requestes This could be easily achieved by JVM settings. But it would be nice to add these in jmx-exporter's documentation. -Dsun.net.httpserver.maxReqTime=20 -Dsun.net.httpserver.maxRspTime=20

Please let me know if this idea makes sense to community. I can work on design.

brian-brazil commented 3 years ago

I've already responded on https://groups.google.com/g/prometheus-developers/c/1wPDXqe_Imo, this does not appear to make sense as if you're in this situation then things have already gone very very badly wrong.