scionproto / scion

SCION Internet Architecture
https://www.scion-architecture.net/
Apache License 2.0
377 stars 157 forks source link

Upgrade prometheus client_golang lib to fix timeout issues #1971

Closed kormat closed 2 weeks ago

kormat commented 5 years ago

The version of https://github.com/prometheus/client_golang/ that we're currently using has no timeout in its http handlers. This means if there are connectivity issues, it's possible for it to get stuck forever trying to read from a tcp connection that the other side has already torn down. This means a socket and goroutine get leaked forever. As of https://github.com/prometheus/client_golang/pull/379, there is now a way to specify a timeout for prometheus http handlers, plus it will automatically gain http keepalives because promhttp now uses http.RoundTripper.

So, we should upgrade to a recent client_golang version, and configure it to use a sane timeout (e.g. 5mins).

lukedirtwalker commented 4 years ago

Note that by now we already use a version that would support this. But we did not yet configure a timeout, it might makes sense to make this configurable via toml.

matzf commented 3 weeks ago

@lukedirtwalker, do you understand what is still missing here?

jiceatscion commented 2 weeks ago

After taking a closer look, I think I understand what's desired here; make sure that a scraping request terminates. I am not sure when that can happen in our case, but preventing it won't hurt. I am not in favor of making the timeout configurable. Given the use case, the time is just a fail safe. A configuration item for that would just be an added burden on the user. I propose setting the timeout at 1 Minute by way of a constant.