CumulativeHistogramLongTaskTimer histogram counts double for active tasks on each takeSnapshot call

The code was originally added with https://github.com/micrometer-metrics/micrometer/commit/9fa83e3d3638edf725aa5d803ae0b52b9017456f. I am not sure but I think the intention is to make the histogram buckets monotonic across Prometheus scrapes so that when a task completes, buckets do not decrease in value. @jkschneider can you confirm?

Unfortunately, this doesn't work because multiple Prometheus instances can scrape in the same interval, which causes bucket counts to double each scrape when the number of active tasks is the same. The difference will not be consistent between scrapes for the same Prometheus instance, either. Here are two tests. The first demonstrating the behavior we want, which passes. The second fails, which shows the problematic behavior with multiple scrapes (calls to takeSnapshot).

class CumulativeHistogramLongTaskTimerTest {

    MockClock clock = new MockClock();
    MeterRegistry registry = new SimpleMeterRegistry(SimpleConfig.DEFAULT, clock) {
        @Override
        protected LongTaskTimer newLongTaskTimer(Meter.Id id, DistributionStatisticConfig distributionStatisticConfig) {
            return new CumulativeHistogramLongTaskTimer(id, clock, getBaseTimeUnit(), distributionStatisticConfig);
        }
    };

    @Test
    void bucketCountsAreMonotonic_whenTasksComplete() {
        LongTaskTimer ltt = LongTaskTimer.builder("my.ltt").publishPercentileHistogram().register(registry);
        LongTaskTimer.Sample sample = ltt.start();
        clock.add(1, TimeUnit.MINUTES);
        CountAtBucket[] countAtBuckets = ltt.takeSnapshot().histogramCounts();

        assertThat(countAtBuckets[0].bucket(TimeUnit.MINUTES)).isEqualTo(2);
        assertThat(countAtBuckets[0].count()).isOne();

        sample.stop();
        // bucket counts after task completed should not decrease
        countAtBuckets = ltt.takeSnapshot().histogramCounts();
        assertThat(countAtBuckets[0].count()).isOne();
    }

    @Test
    @Issue("#2744")
    void bucketCountsStableWhenMultipleSnapshotsTaken() {
        LongTaskTimer ltt = LongTaskTimer.builder("my.ltt").publishPercentileHistogram().register(registry);
        // one task at 20 minutes, one task at 5 minutes
        ltt.start();
        clock.add(15, TimeUnit.MINUTES);
        ltt.start();
        clock.add(5, TimeUnit.MINUTES);
        // the snapshot should be the same since the tasks and clock do not change
        for (int i = 0; i < 3; i++) {
            CountAtBucket[] countAtBuckets = ltt.takeSnapshot().histogramCounts();
            int index = 0;
            while (countAtBuckets[index].bucket(TimeUnit.NANOSECONDS) < Duration.ofMinutes(5).toNanos()) {
                assertThat(countAtBuckets[index++].count()).isZero();
            }
            while (countAtBuckets[index].bucket(TimeUnit.NANOSECONDS) < Duration.ofMinutes(20).toNanos()) {
                assertThat(countAtBuckets[index++].count()).isOne();
            }
            while (index < countAtBuckets.length) {
                assertThat(countAtBuckets[index++].count()).isEqualTo(2);
            }
        }
    }
}

It seems like we would need some time windowing to not add the previous interval until one full interval has passed, but this still is prone to inconsistency. Different scrapes may happen at different times in the interval with a different number of active tasks, but only one scrape's snapshot will be used as the basis to add in the next interval.

Alternatively, GaugeHistogram is exactly what we want and doesn't require the logic of adding the previous snapshot to maintain monotonicity. Unfortunately, GaugeHistogram is not available in Prometheus' text format; it's only defined in OpenMetrics.

micrometer-metrics / micrometer

CumulativeHistogramLongTaskTimer histogram counts double for active tasks on each takeSnapshot call #2744