private-attribution / ipa

A raw implementation of Interoperable Private Attribution
MIT License
40 stars 23 forks source link

Add circular buffer implementation to be later used inside `OrderingSender` #1118

Closed akoshelev closed 2 months ago

akoshelev commented 3 months ago

I've been running IPA at scale and one thing #1085 revealed is that network stack (hyper and kernel) does not really like the large buffers we've been sending after vectorization was enabled. @andyleiserson showed that keeping active work at 1024 results in 1.6Mb chunks sent down from the send buffers. Reducing active work to 32 eliminated the stall, but, according to my perf tests, it made our network utilization worse.

Fixing the size of take, allowed to increase active work parameter to extreme values (100k) and below are the results of those experiments (1M rows, real hardware, no match key encryption, HTTPS).

Active work (items) Max send chunk size (bytes) Runtime (seconds) Delta (from baseline) Notes
1024 4096 438 0 Baseline
32 1024 617 -30% #1073 change with old vectorization factor of 64
1024 2048 427 +2%  
65,536 4096 422 +4%  
65,536 8192 457 -5%  
65,536 2048 390 +11%  
65,536 1024 421 +4%  
32,768 2048 392 +10%  
16,384 2048 412 +6%  
100,000 2048 414 +5%  

This change aims to decouple the vectorization/active work parameters from network send buffers by using circular buffer implementation inside OrderingSender. It only brings the implementation of CircularBuf but I have the change to make OrderingSender use this buf sitting locally in my branch.

The ultimate goal is obviously to allow more configuration for steps, but having constant size chunks dropped down to hyper send buffers seems useful anyway.

Circular buffer

Nothing fancy here, just a vector with read and write pointers. I've been contemplating whether I should make it thread-safe, but I am not convinced we need it in the long term if we pick shard-per-core model. It is not worse than our current implementation of OrderingSender because it also operates on a single allocation, but managing two pointers made it too complicated, so I decided to take the buffering business away from OrderingSender.

Ordering sender

With this change we lose two features that OrderingSender currently supports

Both of these features were never used, but it is important to note that they won't be available if we migrate to CircularBuf.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 83.72642% with 69 lines in your changes missing coverage. Please review.

Project coverage is 91.14%. Comparing base (7bb1325) to head (bf06b8c). Report is 59 commits behind head on main.

:exclamation: Current head bf06b8c differs from pull request most recent head 134b659

Please upload reports for the commit 134b659 to get more accurate results.

Files Patch % Lines
ipa-core/src/helpers/buffers/circular.rs 83.72% 69 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1118 +/- ## ========================================== - Coverage 91.23% 91.14% -0.09% ========================================== Files 187 188 +1 Lines 26776 27200 +424 ========================================== + Hits 24428 24792 +364 - Misses 2348 2408 +60 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.