tikv / rust-prometheus

Prometheus instrumentation library for Rust applications
Apache License 2.0
1.07k stars 182 forks source link

`process_cpu_seconds_total` should not be an integer #451

Open alin-at-dfinity opened 2 years ago

alin-at-dfinity commented 2 years ago

430 switched process_cpu_seconds_total from an (f64) Counter to an IntCounter, for performance reasons.

Issue is that processes that don't use a lot of CPU will now have 1 second CPU usage spikes every so often instead of a more or less flat usage with spikes occurring when they actually do, not when 1 second of CPU usage has accumulated. I.e. this has turned the metric from a smooth growing floating point value into a step function

E.g. a process using 1% CPU on average, with a 10% spike every minute will now look as if it is using 0% CPU most of the time and spike every 2 minutes out of 3.

Considering that most processes export hundreds if not thousands of metrics, I really don't think the marginal performance improvement is worth this regression. Please keep using Counter for cpu_total.

Happy to make the changes myself, if they're likely to be accepted.

alin-at-dfinity commented 2 years ago

Oh, almost forgot: #430 also switched process_start_time_seconds from a Gauge to an IntGauge. While all other changes in that PR were reasonable (file descriptors, memory usage), process_start_time_seconds is also not an integer. Linux (at least) provides the process start time as a floating point value and truncating it to an integer discards useful information (e.g. if I want to know the order in which two processes on the same machine got started; or how long it took to start one vs. the other).

This is more of a nice to have as compared to process_cpu_seconds_total above, but still useful.