micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.42k stars 976 forks source link

Zero values metrics on idle step, lets talk about that. #2659

Open MatheusArleson opened 3 years ago

MatheusArleson commented 3 years ago

The issue

Micrometer publishes zero valued metrics when values have not changed in a step interval of time.

The Rational - In Favor

A zero value in this case means no additional samples were seen in this interval. Not sending anything at all could suggest that the application was not able to deliver a value at all to the monitoring system. So I don't believe sending a zero is a waste.

I'm interested in your feedback on this idea, but I've often seen this conversation go a certain way:

X: I don't want to ship 0 values for counts (sums, maxes, etc.) Me: Why? X: It takes up space for no reason Me: How much space would have been consumed if exactly one event happened per interval? (Answer: the same amount of space that all of these zeroes plus any non-zero values take) X: But this counter (timer, summary, etc.) is bursty. Me: I worry first about a capacity plan that relies on a certain periodic shape in your traffic. Failing this, you could use MeterRegistry.remove when the counter goes un-utilized.

The Rational - Against it

The Fix - Micrometer Approach

  1. Track idle times and invoke remove method to unregister the Instrument.
  2. Configure Instrument expiration (future functionality not implemented) -> See https://github.com/micrometer-metrics/micrometer/issues/1114

IMHO: Until the point 2 is implemented, it is not a good approach to develop own code to workaround a lib behavior, this could very quickly lead to bugs.

The Fix - Suggestions

  1. Marker value : since Micrometer does not support negative values, -1 would do it. This then becomes a problem IF we start supporting them. But just then.
  2. Marker Tag: leverage the fact that tags qualifies the metric, so append a health related one to it on submission.

Then queries could discard the metrics holding the marker and use the rest of the data. Just don't use zero, this causes problems since it makes sense for the background math.

Question

Can we implement the marker strategy instead of the zero value submit ?

Related Issues

Kurru commented 2 years ago

With AWS CloudWatch's custom metric pricing model, reporting zeros on infrequent metrics drives up the price for that metric. CloudWatch charges $ per metric per hour. For each hour that has no data for a metric, the price is zero. The current "publishing zero's" approach makes the cost 100% instead of 5% for a metric only used once per day, or 33% for metric only used during 8 hours of peak.

I expect this specific feature would 2x-20x the overall cost of our metrics, as we have many infrequently used dimensions. (infrequent APIs, limited time use apis and error metrics). Which is not insignificant.

The solution of removing counters after n minutes of inactivity is very acceptable for me, though I would prefer a solution provided by Micrometer to avoid maintenance/bugs as mentioned above.

develop own code to workaround a lib behavior, this could very quickly lead to bugs

melorymonie commented 4 months ago

Was anyone able to find a workaround for this scenario? One other problem is: each 0 datapoint that is not intended to be send by the application is consuming unnecessary license and causing wrong statistics for the application. 0 should not be send on idle steps.