micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.47k stars 990 forks source link

core.instrument.Statistic for percentiles #700

Closed ultimoamore closed 6 years ago

ultimoamore commented 6 years ago

Using https://github.com/micrometer-metrics/micrometer/issues/488 as a guide, I added a percentile to the "measure" method to get the percentiles in my Spring Boot metrics endpoint. A "Statistic" is required for each metric; I don't understand what that means and why I have to specify it:

   @Override
    public Iterable<Measurement> measure() {
        HistogramSnapshot snap = takeSnapshot();
        return Arrays.asList(
                new Measurement(() -> (double) count(), Statistic.COUNT),
                new Measurement(this::totalAmount, Statistic.TOTAL),
                new Measurement(() -> snap.percentileValues()[0].value(),
                       Statistic.UNKNOWN), // <- doesn't make much sense
                new Measurement(this::max, Statistic.MAX)
        );
    }

How is that supposed to be used, especially if I have more than one percentile defined?

jkschneider commented 6 years ago

Why do you need percentile approximations in the Spring Boot actuator endpoint?

ultimoamore commented 6 years ago

I'm using Zabbix to get the data from the http endpoint. I ended up re-writing my own endpoint (using the original Spring endpoint as a source).

More in general, I don't get why a "Measurement" needs to have a specific "Statistic", which is an enum: it seems very limiting.

If I look at some "registry" implementation (example: https://github.com/micrometer-metrics/micrometer/blob/master/implementations/micrometer-registry-new-relic/src/main/java/io/micrometer/newrelic/NewRelicMeterRegistry.java) the "measure()" method is not used; in fact every registry has to go through "instanceof" of every Meter to get the proper information. That is: instead of using the "measure()" interface the system relies on "instanceof". If another Metric is added all the registries would have to be changed. I expected the interface ("measure") would return a list of "String,double" representing the name of the measurement and its value, which could be used in the registries as it is. Instead, registries don't really care about the "measure" defined in the Meter; in fact, every registry decides which are the "measures" it is interested in; what is actually exposed by the Meter (which is what is returned by "measure") is not used, making the "Meter" interface useless. Example:

https://github.com/micrometer-metrics/micrometer/blob/master/implementations/micrometer-registry-new-relic/src/main/java/io/micrometer/newrelic/NewRelicMeterRegistry.java#L147

jkschneider commented 6 years ago

I sort of regret including measure at all, since it is just a quick diagnostic view and not meant to be complete really. For Zabbix, the plan is to provide a real registry implementation (see #65). The actuator endpoint is not meant to be used as a source of data for a particular monitoring system.

jkschneider commented 6 years ago

We left measure for completely custom Meter types. You'll see that the New Relic implementation does in fact use measure as a fall back when all instanceof checks have failed. A custom meter should fully express what statistics it is exposing so each monitoring system knows how to write them.

ultimoamore commented 6 years ago

My 2 cents:

I would say that a code like this:

                if (meter instanceof Timer) {
                    return writeTimer((Timer) meter);
                } else if (meter instanceof FunctionTimer) {
                    return writeTimer((FunctionTimer) meter);
                } else if (meter instanceof DistributionSummary) {
                    return writeSummary((DistributionSummary) meter);
                } else if (meter instanceof TimeGauge) {
                    return writeGauge((TimeGauge) meter);
               [...]

says there's a need for an interface... I expect most registries (all of them???) extract the same info for TimeGauge:

new Attribute("activeTasks", ltt.activeTasks()),
new Attribute("duration", ltt.duration(getBaseTimeUnit())))

and for FunctionCounter:

new Attribute("throughput", counter.count())

(example: same code to retrieve "names" and "values" for timer and distribution: dynatrace vs newrelic )

Which is why I expected the "measure" method to return a list of <String,double> "metrics". In this way every "write*" method could just use the meter.id and list of measurements description+values

But this is just a minor issue in a great solution. I can add the "extended" endpoint class and "extended" SimpleMeterRegistry class (just a few lines of code) if anyone is interested.

EDIT: BTW I think http monitoring is so easy and useful that it is a shame is not a "first class citizen". A REST API can always be used by any system (see how I can use Zabbix even if it's not integrated in the solution)

jkschneider commented 6 years ago

says there's a need for an interface...

That's of course how we started. As it turns out, there are just too many structural differences between the way metrics are exposed to different monitoring systems to capture in an interface for a given meter type. For example, histogram buckets are cumulative (over time and across the range) count statistics with a particular tag in prometheus. The value is completely different for Atlas, being a non-cumulative rate (also non-cumulative across the range). Other monitoring systems don't even support the notion of histograms as a queryable structure. It's just too much to overcome by iterating over a set of measures without knowing what the specific statistics mean.

I think http monitoring is so easy and useful that it is a shame is not a "first class citizen"

I think Zabbix's support for an arbitrary exposition format is peculiar to Zabbix. I don't know of any other monitoring system that doesn't have a specification for this. Even in this case, providing a Zabbix registry implementation is probably the best way to present an opinionated exposition format that doesn't lose fidelity on the more nuanced features of timers, etc.