Closed wildum closed 5 months ago
This looks like it will also fix a separate bug that I came here to report:
We've observed that the exported histogram _count
was wrong when delta aggregation is turned on. The _count
should be equal to the _bucket{le="+Inf"}
value (which was itself correct), but instead the _count
value was equal to the sum of all bucket counts.
I think that bug is due to this code:
This PR replaces that code entirely, and the new code looks correct:
// Increment totals based on incoming totals
h.Sum += other.Sum
h.Count += other.Count
The current implementation is not merging the histogram correctly. The sum is calculated as follow:
count * (existingMean + currentMean) / 2
Now let's say that the existing histogram has a count of 1000 and an existing mean of 10000, the incoming histogram has a count of 100 and a mean of 1000.
newCount = 1000 + 100 = 1100 newMean = (10000 + 1000) / 2 = 5500 newSum = 1100 * newMean = 6_050_000
Now the next incoming histogram has a count of 1000 and an existing mean of 100:
newCount = 1100 + 1000 = 2100 newMean = (5500 +100) / 2 = 2800 newSum = 2100 * 2800 = 5_880_000
By definition, the sum of the histogram is supposed to behave like a counter (A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.)
In the example above, the sum decreased (5_880_000 < 6_050_000) which is not something that should happen.
You can see it happening with real values:
With this PR, the computation of the sum changes to the following:
sum = existingSum + (currentMean * count)
Taking the previous example, we would have:
previousSum = 10_000_000 newSum = previousSum (1000 100) = 10_100_000
previousSum = 10_100_000 newSum = previousSum (100 1000) = 10_200_000
The sum is increasing and the computation is correct: (the sum of 100 buckets with a mean value of 1000 should be equal to the sum of 1000 buckets with a mean value of 100)