Open 3g0r opened 1 week ago
I was thinking about collecting metrics from TCP connections - we have no guarantees about the intervals between packets in general. For example, if we collect the number of bits sent, but there are no ping messages in the protocol between the client and server, we run the risk of forgetting the state of the metrics if the client and server are silent for a long time.
So, I think we really need to extend api. For example add ::mark_as_outdated()
method to counter/histogram/gauge, or may be extend recorder api to add ::remove_<metric kind>()
, or give direct access to registry
.
Yeah, in general, there's no good ergonomic way to let callers (the parts of the code actually emitting the metrics) control when those metrics go away.
This will likely need to be solved through whatever we do to fix #314, since fixing that allows for a better separation between "this metric is no longer live at all" and "this metric hasn't been updated in a while and I want to stop showing it".
"this metric hasn't been updated in a while and I want to stop showing it".
Do we really need this feature?
I think that if we suppress some measurements, Prometheus can't collect them, and Grafana will actually render the gaps in those time slots while the measurements are suppressed for rendering in our app.
My be we can do it more simple if just delegate the responsibility of deleting metrics to users?
At least in other programming languages I have been happy with such an api so far.
We're not going to be changing the core Recorder
API to allow for arbitrarily marking a metric as done/outdated/expired.
As far as wanting to stop showing idle metrics: it's absolutely a thing people want/request. It's very useful to avoid removing a metric as soon as it's no longer used, but instead only after a long enough period of inactivity, in order to avoid sparse reporting.
Hi, in my case I have many spawned tokio tasks that need to be measured. Measurements for these spawned tasks unique by labels, and once solved I have to remove these measurements from the metrics registry to prevent memory leaks. At the same time I need to keep metric
COUNT_OF_ACTIVE_TASKS
available while my program works.At now I can't find any way for solving that problem using current API.
builder.idle_timeout
looks good, but I have no guarantees about the interval for spawning new tasks, henceCOUNT_OF_ACTIVE_TASKS
could be deleted at any time and its state forgotten.Can anyone tell me how to solve this problem without writing an absolute value to
COUNT_OF_ACTIVE_TASKS
on timeout in an infinite loop? 😂