micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.4k stars 968 forks source link

CumulativeHistogramLongTaskTimer histogram counts double for active tasks on each takeSnapshot call #2744

Open shakuzen opened 2 years ago

shakuzen commented 2 years ago

I'm noticing behavior that doesn't look right related to this. If I scrape the prometheus endpoint while there is an active task tracked with a LongTaskTimer and histogram enabled, the bucket counts double with each scrape. All other stats related to the LongTaskTimer (+Inf bucket, active count, duration sum, max) do not change over scrapes, and they go to zero when there are no active tasks. The histogram bucket counts (except for +Inf) do not go to zero.

https://github.com/micrometer-metrics/micrometer/blob/f04bb3218c713b555d2bd3737d940c88fd12808a/micrometer-core/src/main/java/io/micrometer/core/instrument/internal/CumulativeHistogramLongTaskTimer.java#L59

Originally posted by @shakuzen in https://github.com/micrometer-metrics/micrometer/pull/2075#r687514745

shakuzen commented 2 years ago

The code was originally added with https://github.com/micrometer-metrics/micrometer/commit/9fa83e3d3638edf725aa5d803ae0b52b9017456f. I am not sure but I think the intention is to make the histogram buckets monotonic across Prometheus scrapes so that when a task completes, buckets do not decrease in value. @jkschneider can you confirm?

Unfortunately, this doesn't work because multiple Prometheus instances can scrape in the same interval, which causes bucket counts to double each scrape when the number of active tasks is the same. The difference will not be consistent between scrapes for the same Prometheus instance, either. Here are two tests. The first demonstrating the behavior we want, which passes. The second fails, which shows the problematic behavior with multiple scrapes (calls to takeSnapshot).

class CumulativeHistogramLongTaskTimerTest {

    MockClock clock = new MockClock();
    MeterRegistry registry = new SimpleMeterRegistry(SimpleConfig.DEFAULT, clock) {
        @Override
        protected LongTaskTimer newLongTaskTimer(Meter.Id id, DistributionStatisticConfig distributionStatisticConfig) {
            return new CumulativeHistogramLongTaskTimer(id, clock, getBaseTimeUnit(), distributionStatisticConfig);
        }
    };

    @Test
    void bucketCountsAreMonotonic_whenTasksComplete() {
        LongTaskTimer ltt = LongTaskTimer.builder("my.ltt").publishPercentileHistogram().register(registry);
        LongTaskTimer.Sample sample = ltt.start();
        clock.add(1, TimeUnit.MINUTES);
        CountAtBucket[] countAtBuckets = ltt.takeSnapshot().histogramCounts();

        assertThat(countAtBuckets[0].bucket(TimeUnit.MINUTES)).isEqualTo(2);
        assertThat(countAtBuckets[0].count()).isOne();

        sample.stop();
        // bucket counts after task completed should not decrease
        countAtBuckets = ltt.takeSnapshot().histogramCounts();
        assertThat(countAtBuckets[0].count()).isOne();
    }

    @Test
    @Issue("#2744")
    void bucketCountsStableWhenMultipleSnapshotsTaken() {
        LongTaskTimer ltt = LongTaskTimer.builder("my.ltt").publishPercentileHistogram().register(registry);
        // one task at 20 minutes, one task at 5 minutes
        ltt.start();
        clock.add(15, TimeUnit.MINUTES);
        ltt.start();
        clock.add(5, TimeUnit.MINUTES);
        // the snapshot should be the same since the tasks and clock do not change
        for (int i = 0; i < 3; i++) {
            CountAtBucket[] countAtBuckets = ltt.takeSnapshot().histogramCounts();
            int index = 0;
            while (countAtBuckets[index].bucket(TimeUnit.NANOSECONDS) < Duration.ofMinutes(5).toNanos()) {
                assertThat(countAtBuckets[index++].count()).isZero();
            }
            while (countAtBuckets[index].bucket(TimeUnit.NANOSECONDS) < Duration.ofMinutes(20).toNanos()) {
                assertThat(countAtBuckets[index++].count()).isOne();
            }
            while (index < countAtBuckets.length) {
                assertThat(countAtBuckets[index++].count()).isEqualTo(2);
            }
        }
    }
}

It seems like we would need some time windowing to not add the previous interval until one full interval has passed, but this still is prone to inconsistency. Different scrapes may happen at different times in the interval with a different number of active tasks, but only one scrape's snapshot will be used as the basis to add in the next interval.

Alternatively, GaugeHistogram is exactly what we want and doesn't require the logic of adding the previous snapshot to maintain monotonicity. Unfortunately, GaugeHistogram is not available in Prometheus' text format; it's only defined in OpenMetrics.