mirage / prometheus

OCaml library for reporting metrics to a Prometheus server
Apache License 2.0
49 stars 27 forks source link

Prometheus-app: When a lot of metrics are available, requesting them over HTTP may cause application hick-ups. #11

Closed stijn-devriendt closed 4 years ago

stijn-devriendt commented 6 years ago

Because the text formatter is Lwt unaware, containing no Lwt.yield's or other Lwt functions that may block, the HTTP GET may cause starvation of I/O. It probably makes sense to add yield's every now and then, such that an I/O-heavy application doesn't get starved by these CPU-heavy operations.

talex5 commented 6 years ago

I'm surprised the text formatting is taking that long. Do you have some measurements? Is this taking significantly longer than ordinary delays (e.g. due to GC)?

stijn-devriendt commented 6 years ago

We haven't done any measurements. However, with our I/O-heavy application we have seen things like parsing and generation of data cause significant scheduling delays for I/O-bound lwt tasks.

I've observed ~300KiB/metrics-request at some point while we've seen visible user-impact while parsing or generating around 1-1.5MiB.

Instead of fixing the generation immediately, I might add a couple of metrics around the generation (as I've seen the go client do IIRC) to get an idea of the actual CPU-time it consumes.

talex5 commented 4 years ago

Closing this for now. Please reopen if it turns out to be an actual problem.