spotify / heroic

The Heroic Time Series Database
https://spotify.github.io/heroic/
Apache License 2.0
848 stars 109 forks source link

Add DistributionPoint aggregation core components. #728

Closed ao2017 closed 3 years ago

ao2017 commented 3 years ago

Heroic histogram data is currently computed locally. It is practically impossible to aggregate percentile. We are adding distribution to heroic to address that issue. Applications downstream will create data sketches that can be merged to compute percentile on the entire data distribution.

This PR add distribution data points aggregation core components. The implementation follows the same paradigm as existing aggregation instance such as sum and average. These core components include:

TdigestStat : It is an extension of SamplingAggregation. It has metadata for query serialization and aggregation type registration.

TdigestStatBucket This is an extension of AbstractBucket and TdigestBucket Bucket.
This bucket deserialized and merge every distributionPoint added to the bucket.

TdigestStatInstance This is an implementation of BucketAggregationInstance. It builds tdigest buckets and compute stat. This implementation compute P50, P75 and P99 by default.

Tdigest can be merged in any direction. So users should be able to compute stat at every level, that is, per region, cluster or host.

Query involving distribution point will output data point in double. We made this decision to minimized or avoid additional development at the datasource level.

codecov[bot] commented 3 years ago

Codecov Report

Merging #728 (f8289a1) into master (af0ec37) will increase coverage by 0.10%. The diff coverage is 73.01%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #728      +/-   ##
============================================
+ Coverage     54.83%   54.94%   +0.10%     
- Complexity     3084     3103      +19     
============================================
  Files           736      739       +3     
  Lines         20081    20143      +62     
  Branches       1309     1312       +3     
============================================
+ Hits          11012    11067      +55     
- Misses         8600     8608       +8     
+ Partials        469      468       -1     
Impacted Files Coverage Δ Complexity Δ
...spotify/heroic/aggregation/simple/DeltaInstance.kt 81.81% <ø> (ø) 6.00 <0.00> (ø)
...eroic/aggregation/simple/DeltaPerSecondInstance.kt 82.35% <ø> (ø) 6.00 <0.00> (ø)
...ify/heroic/aggregation/simple/FilterAggregation.kt 50.00% <0.00%> (-2.39%) 5.00 <0.00> (ø)
...oic/aggregation/simple/MetricMappingAggregation.kt 57.14% <0.00%> (-2.86%) 5.00 <0.00> (ø)
...y/heroic/aggregation/simple/NotNegativeInstance.kt 77.41% <ø> (ø) 6.00 <0.00> (ø)
...m/spotify/heroic/aggregation/simple/TdigestStat.kt 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...com/spotify/heroic/aggregation/AbstractBucket.java 16.66% <0.00%> (-3.34%) 1.00 <0.00> (ø)
...java/com/spotify/heroic/aggregation/AnyBucket.java 88.88% <0.00%> (-11.12%) 4.00 <0.00> (ø)
...y/heroic/aggregation/simple/TdigestStatInstance.kt 80.00% <80.00%> (ø) 5.00 <5.00> (?)
...c/aggregation/simple/TdigestStatInstanceUtils.java 92.85% <92.85%> (ø) 6.00 <6.00> (?)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update af0ec37...f8289a1. Read the comment docs.

lmuhlha commented 3 years ago

Looks good to me :)

ao2017 commented 3 years ago

It looks good. I'm assuming more unit tests will need to be added for TDigest specific calculations or I'm mistaken and current suit of tests covers new type?

Sure, but we assume tDigest is doing the right thing :-). I will add integration test to validate our aggregation logic. That will be in the next PR.

ao2017 commented 3 years ago

Looks good. In regards of unit tests - it is not clear/unknown to me what is covered by the existing unit tests and what's not. It would be useful if you can add a note describing your strategy in regards of testing. Ex: these tests are covering new logic and these existing tests are covering existing aggregations including new TDigest types.

So my strategy is to make tdigest an aggregation module similar to existing modules such as sum and average.
Sum currently takes Point as input then output GroupPoint. Tdigest will take DistributionPoint as input and output GroupPoint or Point.

I will add integration tests that reflect all possible scenario similar to these tests

I will also do a demo and documentation when it is all done.

Be patient, I am trying to understand this beast( to use your own word :)) as I go.