Open Logiraptor opened 9 months ago
This is going to sound a bit crazy, but I think I would prefer an alternative solution which would solve the same problem:
@Logiraptor, do you have an estimate of the efforts needed to make this API happen? If customers want to use it for DPM debugging purposes, I would imagine they would prefer doing it on the front end
Is your feature request related to a problem? Please describe.
There are two dimensions to ingestion in Mimir: space (ie unique series) and time (samples per minute). The current cardinality API is useful only for understanding the space dimension, but has no support for the time dimension.
Describe the solution you'd like
We should find some efficient way to count the dpm of active time series. I think it may be possible to extend the active series tracker with a dpm measurement based on two rotating buckets.
Essentially, we track two numbers for each series (
openDpmBucket
andclosedDpmBucket
). Each time a series is updated in the tracker, we increment theopenDpmBucket
. Each time the tracker is purged (ingester.active-series-metrics-update-period
, default = 1m), we swap the values ofopenDpmBucket
andclosedDpmBucket
, then resetopenDpmBucket
to0
.Then we could compute an estimate of the dpm of any series via
closedDpmBucket / UpdatePeriod
. This works as long as theUpdatePeriod
is greater than the actual dpm. If the actual dpm is less than one sample perUpdatePeriod
, then we may miss report the dpm as0
.Alternatively, we could do the same thing, but use the
IdleTimeout
as the bucket window, which would give a more useful lower bound of0.1
dpm by default, or0.05
dpm in Grafana Cloud.Describe alternatives you've considered
We've seen that Grafana Cloud customers resort to expensive
count_over_time
queries to find the source of high dpm. One popular solution is to run the querysum by (job) (scrape_samples_scraped)
. This works great assuming data is coming from a prometheus instance, but in practice there are lots of ways time series data can find its way to Mimir, so there's still a gap for some users.