Add circular buffer implementation to be later used inside `OrderingSender`

I've been running IPA at scale and one thing #1085 revealed is that network stack (hyper and kernel) does not really like the large buffers we've been sending after vectorization was enabled. @andyleiserson showed that keeping active work at 1024 results in 1.6Mb chunks sent down from the send buffers. Reducing active work to 32 eliminated the stall, but, according to my perf tests, it made our network utilization worse.

Fixing the size of take, allowed to increase active work parameter to extreme values (100k) and below are the results of those experiments (1M rows, real hardware, no match key encryption, HTTPS).

Active work (items)	Max send chunk size (bytes)	Runtime (seconds)	Delta (from baseline)	Notes
1024	4096	438	0	Baseline
32	1024	617	-30%	#1073 change with old vectorization factor of 64
1024	2048	427	+2%
65,536	4096	422	+4%
65,536	8192	457	-5%
65,536	2048	390	+11%
65,536	1024	421	+4%
32,768	2048	392	+10%
16,384	2048	412	+6%
100,000	2048	414	+5%

This change aims to decouple the vectorization/active work parameters from network send buffers by using circular buffer implementation inside OrderingSender. It only brings the implementation of CircularBuf but I have the change to make OrderingSender use this buf sitting locally in my branch.

The ultimate goal is obviously to allow more configuration for steps, but having constant size chunks dropped down to hyper send buffers seems useful anyway.

Circular buffer

Nothing fancy here, just a vector with read and write pointers. I've been contemplating whether I should make it thread-safe, but I am not convinced we need it in the long term if we pick shard-per-core model. It is not worse than our current implementation of OrderingSender because it also operates on a single allocation, but managing two pointers made it too complicated, so I decided to take the buffering business away from OrderingSender.

Ordering sender

With this change we lose two features that OrderingSender currently supports

spare. We always set spare to 0 before
non-uniform write chunks. OrderingSender allowed randomly sized writes as long as they can fit into the buffer. It will no longer be supported if we go with CircularBuf - all writes must be of the same size. This is also how we use OrderingSender today.

Both of these features were never used, but it is important to note that they won't be available if we migrate to CircularBuf.

Codecov Report

Attention: Patch coverage is 83.72642% with 69 lines in your changes missing coverage. Please review.

Project coverage is 91.14%. Comparing base (7bb1325) to head (bf06b8c). Report is 59 commits behind head on main.

:exclamation: Current head bf06b8c differs from pull request most recent head 134b659

Please upload reports for the commit 134b659 to get more accurate results.

Files	Patch %	Lines
ipa-core/src/helpers/buffers/circular.rs	83.72%	69 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #1118 +/- ## ========================================== - Coverage 91.23% 91.14% -0.09% ========================================== Files 187 188 +1 Lines 26776 27200 +424 ========================================== + Hits 24428 24792 +364 - Misses 2348 2408 +60 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

private-attribution / ipa