prometheus / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
55.38k stars 9.11k forks source link

On-scrape conversion of classic histograms to native histograms (opt-in flag) #13304

Open bwplotka opened 10 months ago

bwplotka commented 10 months ago

Proposal

Acceptance Criteria

Open Questions

Motivation

Using native histograms can be slightly different on PromQL layer (e.g. new functions), but they are generally much cheaper for Prometheus and potential remote backends.

On top of that (why main rationale) native histograms are superior for remote write cases as they naturally make the streaming more atomic/transactional on scale (scraped information about histogram are now self-contained in one sample, instead of multiple series that could be send in different remote write streams/requests). This would a be huge improvement when adopting remote write (both 1.0 and 2.0).

However, migration to native histograms will take time, mostly due to required instrumentation changes (even if it's as simple as upgrading/configuring the SDKs).

Doing automatic migration, ideally in place would be an epic way to have one-off transition to new histograms from certain point of time. This is related to DevSummit topic for transition strategies. I don't think we ever had conclusion on this.

Alternatives

cc @beorn7 @SuperQ @roidelapluie

SuperQ commented 10 months ago

While not as efficient in exposition, this would also allow clients to expose more classic histogram buckets without the down side of increased cardinality on Prometheus.

bboreham commented 10 months ago

This only works if the buckets in your classic histogram match some set of native histogram buckets.

Maybe if you added some error tolerance, like "convert to native histogram if the maximum mismatch of bucket boundary is <1%" ?

bwplotka commented 10 months ago

Yes, I assume there will be some error tolerance, perhaps configurable πŸ‘πŸ½

beorn7 commented 10 months ago

tl;dr: It was always the plan to do this, but we need custom bucket layouts #11277 first.

Longer version:

As @bboreham has mentioned already, converting a classic histogram into a native one only works well in the (unlikely) case that the bucket layout of the classic histogram closely matches the bucket layout of a native histogram. In practice, this will happen very rarely. The most believable scenario would be bucket boundaries like 1, 2, 4, 8, 16, … , which is schema 0 in the native histogram world. Even allowing a small-ish error tolerance will not create many more matches. We could use interpolation and use a significantly higher resolution for the native histogram, filling the (many) native buckets that are in the same range as one of the original classic buckets with equal parts of its count. This would create "equally bad" quantile estimations, maybe still at a somewhat lower resource cost. I'm not sure if it is worth going down that path. It will also create confusion.

Custom bucket layouts (see #11277) would solve all the problems. We could just directly emulate the classic histogram. And this very use case was one of the motivating factors of putting custom bucket layouts on the feature list. It is, however, quite involved, and we have many lower hanging fruit to harvest before.

bwplotka commented 10 months ago

Good points.

I wonder if despite no custom buckets support we could do some (opt in) translation, with some (big) error tolerance, even accepting all those "bad" consequences.

Rationales:

A) We could do this now. ~B) With custom bucket layouts, many downstream Prometheus users would still have exactly the same problem. That translation will be needed for systems which only support either static or exponential buckets (e.g. Otel and Google, but most likely everybody else who does not directly import Prometheus DB) and did not implement a mix mode (or don't plan to). The difference is that it will be not directly a Prometheus problem.~

EDIT: I somehow assumed we want a "mixed" histogram, so sample with both exponential and custom buckets 😱 verified with @beorn7 that's not the case, it's either one or another πŸ™ˆ

~So my question is.. is there a room for adding no-custom-bucket mode for this conversion for now and perhaps later? Once custom buckets will land in native histograms we could either replace it or have two modes πŸ€” @beorn7~

EDIT: Given above mistake, it might indeed much better to collab on custom bucket work πŸ€” How I can help πŸ™ˆ