Closed cberkhoff closed 1 month ago
Attention: Patch coverage is 99.56897%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 92.84%. Comparing base (
23202a2
) to head (47569e4
). Report is 7 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
...c/protocol/ipa_prf/aggregation/breakdown_reveal.rs | 99.23% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This change is still a WIP. Just want to get some baseline numbers from Draft.
I ran a queries and got the following stats:
Old | New | ||
---|---|---|---|
Runtime [m] | 9.583333333 | 30.0 | 313.48% |
Aggregation / Total Runtime | 63.9% | 93.59% | |
Shuffle | 0.41% | ||
Reveal | 4.92% | ||
Additions | 83.05% |
Having this in mind I focus on Additions and then Reveal
With the seq_join
in add_tvs_by_bk
performance improved by ~6x and now the total runtime is the same between old and new:
Old | New | |
---|---|---|
Runtime [m] | 9.583333333 | 9.5 |
Aggregation time / Total Runtime | 63.9% | 79.16% |
Attribution | 23.02% | |
Shuffle | 1.56% | |
Reveal | 18.44% | |
Additions | 56.98% |
Using seq_join
in reveal_breakdowns
made it 14x faster. The protocol as a whole is 18% faster than the old one.
Added a new aggregation logic for Trigger Values into the Histogram. With it, aggregation as a whole is about 3x faster than the current one.
Results running IPA for 1M events | |||
---|---|---|---|
Old [duration] | New [duration] | Improvement | |
Total Runtime | 09:35 | 03:57 | 242.62% |
Aggregation time | 06:08 | 02:06 | 292.06% |
Duration | % of aggregation | ||
Attribution | - | 01:30 | 71.43% |
Shuffle | - | 00:07 | 5.56% |
Reveal | - | 00:05 | 3.97% |
Additions | - | 00:23 | 18.25% |
Digging trough the logs I think vectorizing can make aggregation take 20seconds off aggregation, which is taking around 126 seconds. There might be more gains from more efficient code.
This PR is a work in progress to enable Breakdown Reveal Aggregation (aka Improved Aggregation). Aggregation steps happen after attribution. The input to aggregation is a stream of tuples of (attributed breakdown key, attributed trigger value). The output of aggregation is a Histogram. Breakdown Keys and Trigger Values are assigned by the advertiser and sent in the input of IPA. Breakdown Keys values are expected to be dense. How breakdown keys and trigger values are defined is out-of-scope.
High level explanation of the protocol:
DP noise to histogram buckets will be added by the caller of this function. This because adding noise is really unrelated to how things are added up.
Following are the improvements to be done before merging in no particular order: