Support multiple histogram output formats

jmartisk commented 4 years ago

Allow switching histogram output from showing percentiles to showing item counts within defined buckets. For example, a timer will export something like this

starlette_requests_processing_time_seconds_bucket{le="5.0",method="GET",path_template="/metrics"} 12789.0
starlette_requests_processing_time_seconds_bucket{le="7.5",method="GET",path_template="/metrics"} 12789.0
starlette_requests_processing_time_seconds_bucket{le="10.0",method="GET",path_template="/metrics"} 12789.0
starlette_requests_processing_time_seconds_bucket{le="+Inf",method="GET",path_template="/metrics"} 12789.0

Originally reported in MP Metrics as https://github.com/eclipse/microprofile-metrics/issues/587

jmartisk commented 4 years ago

Hey @theute + @pilhuhn , so I'd start with timers only for now and implement this in a way that SmallRye will have a io.smallrye.metrics.annotation.Timed annotation that does the same thing as org.eclipse.microprofile.metrics.annotation.Timed, but offers two new parameters:

histogramType which is an enum with values PERCENTILES and THRESHOLD_VALUES (default will be PERCENTILES)
buckets which will be a float[] and it will define the buckets when histogramType==THRESHOLD_VALUES. They will be considered to be in the same unit as the main unit of the timer, and if this is undefined, there will be some default.
- Potentially we could apply the buckets field as well when histogramType==PERCENTILES, in which case the array would be interpreted as desired percentiles ([0.01, 0.05, 0.1, 0.5, 0.9, 0.99] etc)

Our SmallRye-specific ExtendedMetadata will also add the same two fields with the same semantics. These fields will only be taken into account for timers, ignored for other metric types.

If this proves useful, it will be backported to MP Metrics spec later.

Do you have any other ideas?

theute commented 4 years ago

Do you have a code example ? That would speak to me better.

jmartisk commented 4 years ago

It would look like this - for annotated metrics you will use

@io.smallrye.metrics.annotation.Timed(histogramType=HistogramType.THRESHOLD_VALUES, 
     buckets=[0.05, 0.1, 0.5])
// instead of @org.eclipse.microprofile.annotation.Timed
public void timedMethod() {
    // your logic
}

for programmatic approach without annotations, you will use

@Inject
MetricRegistry registry;

ExtendedMetadata metadata = new io.smallrye.metrics.ExtendedMetadataBuilder()
    .withName("my-timer")
    .withHistogramType(HistogramType.THRESHOLD_VALUES)
    .withBuckets(0.05, 0.1, 0.5)
    .build();
registry.timer(metadata);

the timer registered this way would be exported the way you requested, with the defined buckets. There will also be some sort of default for the buckets, so you don't have to specify them.

jmartisk commented 4 years ago

Perhaps COUNTS_WITHIN_BUCKETS would be more understandable than THRESHOLD_VALUES, I don't know.

theute commented 4 years ago

Maybe BY_BUCKETS or PER_BUCKET ? How about those who would want both buckets and percentile ? Can you use twice the annotation (different names ?) ?

Also for buckets it should count all request lower or equals to the threshold, and I suggest to use the name "le" like in other solutions ( https://prometheus.io/docs/practices/histograms/ )

So for buckets of 1, 2, 3 and for a time of 2 this should have {le='1'} =1 {le='2'} =1 {le='3'} =0 {le='+Inf'} =0

(Also seen here: https://books.google.ch/books?id=EpxVDwAAQBAJ&pg=PA232&lpg=PA232&dq=%22le%22+metrics+latency&source=bl&ots=31_7p39pbT&sig=ACfU3U1fnScK3hkBd17Q6VKe7eeM9VWe1Q&hl=en&sa=X&ved=2ahUKEwjL-7qUrobqAhWBoVwKHY9tAgEQ6AEwAHoECAcQAQ#v=onepage&q=%22le%22%20metrics%20latency&f=false )

theute commented 4 years ago

I may have asked the wrong thing actually... Maybe we should have a different approach and not a task for SmallRye metrics...

In the end it would be handy to also have the HTTP service path and HTTP method (and buckets) like in the upper example used in Python... Maybe we should just implement a JAX-RS filter... @pilhuhn thoughts ?

abkieling commented 4 years ago

This is a very important feature. Having the histogram data available in Prometheus gives a lot of flexibility when querying and aggregating the data. The same doesn't happen when the percentiles are calculated in the client. When in doubt about design decisions, have a look at the Micrometer library and the Prometheus Java client.

abkieling commented 4 years ago

@jmartisk Is there any plan to include this feature in the next release of SmallRye? I need to decide between temporarily use the existing percentiles calculated on the client or wait until it gets released. Thanks.

jmartisk commented 4 years ago

If there's really demand for something like what I described in comments https://github.com/smallrye/smallrye-metrics/issues/325#issuecomment-644575785 and https://github.com/smallrye/smallrye-metrics/issues/325#issuecomment-644733317 then I am willing to do that, I think it is realistic that it could get into SR Metrics 3.0 (but I can't promise 100 %). However, PRs with contributions are welcome too ;) Note that it would be marked as an experimental vendor-specific feature and therefore not guaranteed stable.

abkieling commented 4 years ago

I don't think there are types of histogram. There are output formats: precalculated percentiles (Prometheus calls it Summary) or the raw histogram data (the values in each bucket). Just an idea: a new attribute called output with possible values: summary or histogram.

smallrye / smallrye-metrics

Support multiple histogram output formats #325