prometheus / client_golang

Prometheus instrumentation library for Go applications
https://pkg.go.dev/github.com/prometheus/client_golang
Apache License 2.0
5.4k stars 1.18k forks source link

feature: Add exemplar without observation/incrementing? #868

Open LouisStAmour opened 3 years ago

LouisStAmour commented 3 years ago

The use case I have in mind is for trace-derived metrics in something like an Otel Collector, where you want to output near real-time metrics but you want to add traces as exemplars (such as what Grafana supports) but you can't do both: sometimes you'll want to mark an interesting trace as an exemplar but you only know it's interesting because a later-processed inner span had an exception event and your exemplar would thus have an "exception" label on it to highlight it in red, possibly.

A similar idea I have is to automatically pick some exemplars based on what buckets they go into when observing for a histogram. This would I suppose be a new API call to observeWithExemplarCandidate say, that would take the same exemplar data but based on the histogram bucketing it could be picked as an exemplar or not and would return whether picked as an exemplar or not (maybe?)

The alternative to such a sampling API would be to allow observations to histograms like normal where the choice of observing with exemplar could be compared against the current value of the histogram, such as to know whether the value is an outlier or not. The difficulty here is that you might not have many requests and you might want to sample one request per bucket. At that point, you again need to keep your bucket data for up to a minute, then decide on which exemplars to add up to a minute after you've first observed them. That's again something that requires separating backdated exemplars from real-time observations, thus proving the need for this API call.

I'm new to the Prometheus client library community, so please let me know if this idea has merit. I'm thinking of suggesting the same in the Otel Collector community as I'm having a hard time adopting trace-derived metrics for Grafana's new support of exemplars without being able to separate real-time metrics from past-observed exemplar values.

beorn7 commented 3 years ago

I'm not sure I fully understand what you are proposing. Some parts of it seem to be easily possible with the current primitives. Others (like outlier detection) seem a bit too specific and too out-of-scope to be built into the instrumentation library. But in both cases, I don't understand why you would like to add an examplar without observation. An exemplar is always linked to an observation, isn't it?

Perhaps you could provide code examples (imagining the method call you desire already exist)?

LouisStAmour commented 3 years ago

The primary use case I'm supporting here is for tail-based sampling decisions while also preserving real-time metrics. The endpoint collecting Prometheus exemplars should not have to check and see if a trace was kept or not.

Right now, if we want to record every trace ID or a small sample of them as exemplars, we can. But that overwhelms the server collecting exemplars because it has to make sure every trace ID logged as an exemplar still exists. This can be done by choosing in advance what traces to keep but that doesn't allow for tail-based sampling.

The alternative is to perform tail-based sampling and wait for us to know when we have a trace that's a good candidate for an exemplar before logging both the exemplar and the observation. But this means are metrics are either duplicated or not real-time, they're delayed by how long we have to wait for the observation to be confirmed as an exemplar. Or we observe the same metric value twice which would throw off the metric, skewing it in favour of metrics that have exemplars attached.

All of this is made simpler if metrics and exemplars are kept separate such that an observation can be logged with an exemplar, or an exemplar can be logged well after its observation.

The advantage in doing so is that exemplars can be back-dated but observations cannot, at least not unless you consider if they've been observed in real-time or not.

Does this make sense? Or should I try to document this more formally? The use case for Prometheus library to add this would be to support uses in Otel Collector which waits for all spans to arrive before deciding what traces to keep and why.

Secondary uses would be for identifying outliers that have already been observed as candidates to keep as exemplars after a second or two had passed, without delaying the metric for the second or two required -- again, the idea is that the observation is made once, before it is identified as an exemplar and is merely a candidate, and then it is later made an exemplar but should not be observed a second time as that would throw off the results.

Workarounds in absence of this feature:

beorn7 commented 3 years ago

Hmm, I see…

So far, I thought exemplars that make it into Prometheus are seen as a source for tail sampling, not the other way around. Following the idea that relatively few traces see their IDs being persisted in Prometheus within an exemplar (at most one per scrape per bucket or counter), and if they do, they are "interesting by definition". See also https://www.youtube.com/watch?v=TzNZIEvhAdA where this idea was presented quite early on.

You could also say, if you feed back the tail sampling into the decision which exemplars to expose, you might get some buckets of a histogram never populated because the tail sampling is biased against those latency areas.

Also note that the Prometheus collection model wants you to collect the metrics directly from the producing binary. So feeding back the tail sampling decision into the exemplar selection means to relay the information back from your tracing system all the way to the individual binaries, which seems rather brittle and far fetched.

I guess with the OTel collector, the situation might be different because it is already somehow a central component, merely forwarding metrics to Prometheus (or even doing remote write, bypassing the Prometheus server altogether?).

The primitives in this library are mostly meant for direct instrumentation of a binary. Perhaps the OTel collector case is more in the category "mirroring metrics from another system into Prometheus metrics", for which the more low level constructors NewConstMetric/NewConstHistogram are meant to be used. (Their naming is rather unfortunate, for historical reasons.) Perhaps it would be a cleaner solution to add helpers to add exemplars to a readily created Metric instance, similar to how NewMetricWithTimestamp adds a timestamp to a Metric. I had plans for that anyway.

With that, you would manage the histogram buckets etc. yourself, and materialize it into a Prometheus metric upon collection, with all the bucket values and exemplars directly controlled. E.g.

promHis := NewMetricWithExemplars(MustNewConstHistogram(
    descFromOtelHis(otelHis),
    countFromOtelHis(otelHis),
    sumFromOtelHis(otelHis),
    bucketsFromOtelHis(otelHis),
), exemplar1, exemplar2, exemplar3)
beorn7 commented 3 years ago

Having said all that, I'm not an exemplar expert. Happy to hear more opinions about this.

wperron commented 2 years ago

Hey @beorn7 👋 We're in a similar situation, our case is pretty much exactly what you mentioned in the Otel collector; we have metrics coming from another system that we want to mirror to prometheus format to be used in the prometheusexporter. We have a branch up on our fork that adds support for exemplars on constHistogram. I'm not sure it solves the initial issue described here but it does solve the Otel exporter case. I'd be happy to do more work on that patch to get it merged 😄 WDYT?

beorn7 commented 2 years ago

If it's already in your fork, no harm in creating a PR. Ultimately, this needs to be decided by the maintainers of this repository.