Closed ultimoamore closed 6 years ago
Why do you need percentile approximations in the Spring Boot actuator endpoint?
I'm using Zabbix to get the data from the http endpoint. I ended up re-writing my own endpoint (using the original Spring endpoint as a source).
More in general, I don't get why a "Measurement" needs to have a specific "Statistic", which is an enum: it seems very limiting.
If I look at some "registry" implementation (example: https://github.com/micrometer-metrics/micrometer/blob/master/implementations/micrometer-registry-new-relic/src/main/java/io/micrometer/newrelic/NewRelicMeterRegistry.java) the "measure()" method is not used; in fact every registry has to go through "instanceof" of every Meter to get the proper information. That is: instead of using the "measure()" interface the system relies on "instanceof". If another Metric is added all the registries would have to be changed. I expected the interface ("measure") would return a list of "String,double" representing the name of the measurement and its value, which could be used in the registries as it is. Instead, registries don't really care about the "measure" defined in the Meter; in fact, every registry decides which are the "measures" it is interested in; what is actually exposed by the Meter (which is what is returned by "measure") is not used, making the "Meter" interface useless. Example:
I sort of regret including measure
at all, since it is just a quick diagnostic view and not meant to be complete really. For Zabbix, the plan is to provide a real registry implementation (see #65). The actuator endpoint is not meant to be used as a source of data for a particular monitoring system.
We left measure
for completely custom Meter
types. You'll see that the New Relic implementation does in fact use measure
as a fall back when all instanceof checks have failed. A custom meter should fully express what statistics it is exposing so each monitoring system knows how to write them.
My 2 cents:
I would say that a code like this:
if (meter instanceof Timer) {
return writeTimer((Timer) meter);
} else if (meter instanceof FunctionTimer) {
return writeTimer((FunctionTimer) meter);
} else if (meter instanceof DistributionSummary) {
return writeSummary((DistributionSummary) meter);
} else if (meter instanceof TimeGauge) {
return writeGauge((TimeGauge) meter);
[...]
says there's a need for an interface... I expect most registries (all of them???) extract the same info for TimeGauge:
new Attribute("activeTasks", ltt.activeTasks()),
new Attribute("duration", ltt.duration(getBaseTimeUnit())))
and for FunctionCounter:
new Attribute("throughput", counter.count())
(example: same code to retrieve "names" and "values" for timer and distribution: dynatrace vs newrelic )
Which is why I expected the "measure" method to return a list of <String,double> "metrics". In this way every "write*" method could just use the meter.id and list of measurements description+values
But this is just a minor issue in a great solution. I can add the "extended" endpoint class and "extended" SimpleMeterRegistry class (just a few lines of code) if anyone is interested.
EDIT: BTW I think http monitoring is so easy and useful that it is a shame is not a "first class citizen". A REST API can always be used by any system (see how I can use Zabbix even if it's not integrated in the solution)
says there's a need for an interface...
That's of course how we started. As it turns out, there are just too many structural differences between the way metrics are exposed to different monitoring systems to capture in an interface for a given meter type. For example, histogram buckets are cumulative (over time and across the range) count statistics with a particular tag in prometheus. The value is completely different for Atlas, being a non-cumulative rate (also non-cumulative across the range). Other monitoring systems don't even support the notion of histograms as a queryable structure. It's just too much to overcome by iterating over a set of measures without knowing what the specific statistics mean.
I think http monitoring is so easy and useful that it is a shame is not a "first class citizen"
I think Zabbix's support for an arbitrary exposition format is peculiar to Zabbix. I don't know of any other monitoring system that doesn't have a specification for this. Even in this case, providing a Zabbix registry implementation is probably the best way to present an opinionated exposition format that doesn't lose fidelity on the more nuanced features of timers, etc.
Using https://github.com/micrometer-metrics/micrometer/issues/488 as a guide, I added a percentile to the "measure" method to get the percentiles in my Spring Boot metrics endpoint. A "Statistic" is required for each metric; I don't understand what that means and why I have to specify it:
How is that supposed to be used, especially if I have more than one percentile defined?