Closed hozan23 closed 2 months ago
Hello @zeenix, Please let me know what do you think about this PR.
Regarding performance, it's worth noting that on most processor architectures, there is no much difference between an atomic store and a regular non-atomic store (source).
Thanks for the citing a source on this. I'll read it later..
Additionally, Atomic operations are generally not as expensive as mutexes, as mutexes themselves are implemented using atomic operations but with additional overhead for locking.
It would be nice to have some numbers on this but I guess we know for sure that at least atomics are not more expensive. So if atomics are not more expensive than their non-atomic primitive types for most machines out there, there is no downside to using atomics and that's good enough for me.
After conducting benchmarks for the new changes, it turns out using atomic operations results in a performance regression of around 10-11%. This regression is likely to increase with more receivers.
Here are the benchmark results on an Intel x86_64 architecture CPU by running cargo bench
on the main repo:
Without atomic operations:
1 -> 1 time: [118.89 ns 119.26 ns 119.71 ns]
Found 21 outliers among 100 measurements (21.00%)
3 (3.00%) high mild
18 (18.00%) high severe
1 -> 2 time: [148.34 ns 148.65 ns 149.01 ns]
Found 18 outliers among 100 measurements (18.00%)
1 (1.00%) high mild
17 (17.00%) high severe
1 -> 4 time: [213.03 ns 213.59 ns 214.25 ns]
Found 19 outliers among 100 measurements (19.00%)
7 (7.00%) high mild
12 (12.00%) high severe
1 -> 8 time: [337.38 ns 337.90 ns 338.52 ns]
Found 19 outliers among 100 measurements (19.00%)
2 (2.00%) high mild
17 (17.00%) high severe
1 -> 16 time: [585.31 ns 586.83 ns 588.65 ns]
Found 19 outliers among 100 measurements (19.00%)
3 (3.00%) high mild
16 (16.00%) high severe
With atomic operations:
1 -> 1 time: [121.59 ns 121.96 ns 122.42 ns]
change: [+1.3652% +1.9991% +2.5771%] (p = 0.00 < 0.05)
Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
20 (20.00%) high severe
1 -> 2 time: [158.20 ns 158.62 ns 159.08 ns]
change: [+6.2135% +6.5239% +6.8636%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
7 (7.00%) high mild
11 (11.00%) high severe
1 -> 4 time: [232.68 ns 233.26 ns 233.93 ns]
change: [+9.0008% +9.3453% +9.7053%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
1 -> 8 time: [370.89 ns 371.66 ns 372.54 ns]
change: [+10.014% +10.440% +10.926%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
1 -> 16 time: [653.93 ns 655.80 ns 657.86 ns]
change: [+11.406% +11.762% +12.118%] (p = 0.00 < 0.05)
Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
1 (1.00%) high mild
22 (22.00%) high severe
I am closing this PR due to this regression.
@hozan23 Thanks so much for your hard work still. I certainly learnt things in the course of this PR and now we've a very good answer to "why not allow Receiver to be non-mut?".
The
recv()
,recv_direct()
,recv_blocking()
, andtry_recv()
methods currently require&mut self
to modify the value of theposition
field. By using AtomicU64 for theposition
field eliminates the need for mutability.Regarding performance, it's worth noting that on most processor architectures, there is no much difference between an atomic store and a regular non-atomic store (source: 1, 2).
Additionally, Atomic operations are generally not as expensive as mutexes, as mutexes themselves are implemented using atomic operations but with additional overhead for locking.
This PR has breaking API changes.
Fixes issue #66