Closed ao2017 closed 3 years ago
Merging #728 (f8289a1) into master (af0ec37) will increase coverage by
0.10%
. The diff coverage is73.01%
.
@@ Coverage Diff @@
## master #728 +/- ##
============================================
+ Coverage 54.83% 54.94% +0.10%
- Complexity 3084 3103 +19
============================================
Files 736 739 +3
Lines 20081 20143 +62
Branches 1309 1312 +3
============================================
+ Hits 11012 11067 +55
- Misses 8600 8608 +8
+ Partials 469 468 -1
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
...spotify/heroic/aggregation/simple/DeltaInstance.kt | 81.81% <ø> (ø) |
6.00 <0.00> (ø) |
|
...eroic/aggregation/simple/DeltaPerSecondInstance.kt | 82.35% <ø> (ø) |
6.00 <0.00> (ø) |
|
...ify/heroic/aggregation/simple/FilterAggregation.kt | 50.00% <0.00%> (-2.39%) |
5.00 <0.00> (ø) |
|
...oic/aggregation/simple/MetricMappingAggregation.kt | 57.14% <0.00%> (-2.86%) |
5.00 <0.00> (ø) |
|
...y/heroic/aggregation/simple/NotNegativeInstance.kt | 77.41% <ø> (ø) |
6.00 <0.00> (ø) |
|
...m/spotify/heroic/aggregation/simple/TdigestStat.kt | 0.00% <0.00%> (ø) |
0.00 <0.00> (?) |
|
...com/spotify/heroic/aggregation/AbstractBucket.java | 16.66% <0.00%> (-3.34%) |
1.00 <0.00> (ø) |
|
...java/com/spotify/heroic/aggregation/AnyBucket.java | 88.88% <0.00%> (-11.12%) |
4.00 <0.00> (ø) |
|
...y/heroic/aggregation/simple/TdigestStatInstance.kt | 80.00% <80.00%> (ø) |
5.00 <5.00> (?) |
|
...c/aggregation/simple/TdigestStatInstanceUtils.java | 92.85% <92.85%> (ø) |
6.00 <6.00> (?) |
|
... and 15 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update af0ec37...f8289a1. Read the comment docs.
Looks good to me :)
It looks good. I'm assuming more unit tests will need to be added for TDigest specific calculations or I'm mistaken and current suit of tests covers new type?
Sure, but we assume tDigest is doing the right thing :-). I will add integration test to validate our aggregation logic. That will be in the next PR.
Looks good. In regards of unit tests - it is not clear/unknown to me what is covered by the existing unit tests and what's not. It would be useful if you can add a note describing your strategy in regards of testing. Ex: these tests are covering new logic and these existing tests are covering existing aggregations including new TDigest types.
So my strategy is to make tdigest an aggregation module similar to existing modules such as sum and average.
Sum currently takes Point as input then output GroupPoint.
Tdigest will take DistributionPoint as input and output GroupPoint or Point.
I will add integration tests that reflect all possible scenario similar to these tests
I will also do a demo and documentation when it is all done.
Be patient, I am trying to understand this beast( to use your own word :)) as I go.
Heroic histogram data is currently computed locally. It is practically impossible to aggregate percentile. We are adding distribution to heroic to address that issue. Applications downstream will create data sketches that can be merged to compute percentile on the entire data distribution.
This PR add distribution data points aggregation core components. The implementation follows the same paradigm as existing aggregation instance such as sum and average. These core components include:
TdigestStat : It is an extension of SamplingAggregation. It has metadata for query serialization and aggregation type registration.
TdigestStatBucket This is an extension of AbstractBucket and TdigestBucket Bucket.
This bucket deserialized and merge every distributionPoint added to the bucket.
TdigestStatInstance This is an implementation of BucketAggregationInstance. It builds tdigest buckets and compute stat. This implementation compute P50, P75 and P99 by default.
Tdigest can be merged in any direction. So users should be able to compute stat at every level, that is, per region, cluster or host.
Query involving distribution point will output data point in double. We made this decision to minimized or avoid additional development at the datasource level.