prometheus / client_golang

Prometheus instrumentation library for Go applications
https://pkg.go.dev/github.com/prometheus/client_golang
Apache License 2.0
5.33k stars 1.17k forks source link

Proposal: simplify Go runtime/metrics histogram buckets #1287

Open bboreham opened 1 year ago

bboreham commented 1 year ago

I looked at go_gc_heap_allocs_by_size_bytes_bucket, and found it strange that the buckets include le="24.999999999999996" and le="64.99999999999999".

With help from @beorn7 I believe this is because the bucket is defined as "less than 25", "less than 65", etc., modified by math.NextAfter(): https://github.com/prometheus/client_golang/blob/a09a1d34cbc74daa8ed70234b99467a30b020a40/prometheus/go_collector_latest.go#L555

I propose we take advantage of the knowledge that fractional values do not show up in byte counts, and adjust the buckets as le="24", le="64", ...

This would be a breaking change for anyone relying on the exact values, but overall a benefit.

I'm using v1.15.1.

bwplotka commented 1 year ago

Hm, I don't have a strong opinion, other than making sure that when we adjust the buckets we don't account observations with value > 24 && value <= 25 to wrong bucket (:

beorn7 commented 1 year ago

The bucket coming from the runtime has an upper exclusive boundary of 25 (or, equivalently, a lower inclusive boundary of 25). An observation at value 25 will therefore end up one bucket "up" from the bucket currently labeled as le="24.999999999999996". That won't change by labeling it le="24". @bboreham's argument above is that the runtime will never observe fractional values of bytes, so observations of e.g. 24.5 will not happen, and changing the label from le="24.999999999999996" to le="24" won't change the count value in the bucket.

bwplotka commented 1 year ago

Fair 👍🏽

ArthurSens commented 10 months ago

This would be a breaking change for anyone relying on the exact values, but overall a benefit.

I'm curious how we handle such situations... With no data other than my experience using Prometheus so far, histograms are used mostly as just another counter and we often use histogram_quantile(X, sum(rate(....

I've seen alerts built on top of a particular bucket for things like HTTP request durations, but never for go metrics.

Would a change like this require client_golang v2?

ArthurSens commented 10 months ago

I've seen alerts built on top of a particular bucket for things like HTTP request durations, but never for go metrics.

Actually, with this same line of thought, does having a bucket called 24.9999 cause any other problems besides not being good to read?

bboreham commented 10 months ago

Making it v2 would mean you can link both versions of the library, so we would also need to rename the metrics so both sets can be collected.

My proposal is we don’t. Just change it and assume nobody is affected.

any other problem

It’s much bigger, which has a cost in storage, in remote-write, etc.

ArthurSens commented 10 months ago

I'm in favor, making sure we do some good communication around this change