prometheus / client_java

Prometheus instrumentation library for JVM applications
http://prometheus.github.io/client_java/
Apache License 2.0
2.17k stars 792 forks source link

[httpserver exporter] No option to specify a wait time for ongoing requests when closing the server #938

Open nicu-da opened 5 months ago

nicu-da commented 5 months ago

The wait time when shutting down the HTTP server exporter is hard-coded to 0. This in turn leads to errors being logged during shutdown if there is a request in progress to scrape the metrics while close is called.

The Prometheus metrics HTTPServer caught an Exception while trying to send the metrics response
java.io.IOException: stream is closed
    at jdk.httpserver/sun.net.httpserver.Request$WriteStream.write(Request.java:382)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
    at jdk.httpserver/sun.net.httpserver.ExchangeImpl.sendResponseHeaders(ExchangeImpl.java:280)
    at jdk.httpserver/sun.net.httpserver.HttpExchangeImpl.sendResponseHeaders(HttpExchangeImpl.java:85)
    at io.prometheus.metrics.exporter.httpserver.HttpExchangeAdapter$HttpResponse.sendHeadersAndGetBody(HttpExchangeAdapter.java:78)
    at io.prometheus.metrics.exporter.common.PrometheusScrapeHandler.handleRequest(PrometheusScrapeHandler.java:66)
    at io.prometheus.metrics.exporter.httpserver.MetricsHandler.handle(MetricsHandler.java:43)
    at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
    at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
    at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
    at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:851)
    at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
    at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:818)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Ideally the wait should be configurable.

EDIT: formatted stacktrace @dhoard

fstab commented 5 months ago

Thanks @nicu-da, I'm happy to make it configurable. Do you think 0 is a reasonable default?

nicu-da commented 5 months ago

Thank you for the quick response @fstab.

Checking the docs for close:

    /**
     * Stops this server by closing the listening socket and disallowing
     * any new exchanges from being processed. The method will then block
     * until all current exchange handlers have completed or else when
     * approximately <i>delay</i> seconds have elapsed (whichever happens
     * sooner). Then, all open TCP connections are closed, the background
     * thread created by {@link #start()} exits, and the method returns.
     * Once stopped, a {@code HttpServer} cannot be re-used.
     *
     * @param delay the maximum time in seconds to wait until exchanges have finished
     * @throws IllegalArgumentException if delay is less than zero
     */

it sounds like using a non 0 value as a default would be saner, giving 1 second for any scrape to finish before shutting down the server. It shouldn't have an impact in most cases as even if there is a scrape ongoing, in most cases it should finish a lot quicker.

friscoMad commented 3 months ago

+1 to this request probably giving 1 sec is a saner default but let others make it 0 if they need it.

Also, we would like to have a way to pass a custom metric handler In our use case, we want to ensure that there was at least one scrape before closing as we want to scrape tasks that can have a variable duration, and we also want to be sure that the last values are scraped.

In the old version (0.16) we could use a fake name filter to detect when it was scraped and then close the server (adding some extra delay) with the newer versions there is no way to do that so it will be nice if you could make the handler configurable (or implement an option that when close is called it waits for a last scrape until it does the server).