smol-rs / async-broadcast

Async broadcast channels
Apache License 2.0
163 stars 26 forks source link

Use AtomicU64 instead of u64 for the position field in Receiver #70

Closed hozan23 closed 2 months ago

hozan23 commented 3 months ago

The recv(), recv_direct(), recv_blocking(), and try_recv() methods currently require &mut self to modify the value of the position field. By using AtomicU64 for the position field eliminates the need for mutability.

Regarding performance, it's worth noting that on most processor architectures, there is no much difference between an atomic store and a regular non-atomic store (source: 1, 2).

Additionally, Atomic operations are generally not as expensive as mutexes, as mutexes themselves are implemented using atomic operations but with additional overhead for locking.

This PR has breaking API changes.

Fixes issue #66

hozan23 commented 3 months ago

Hello @zeenix, Please let me know what do you think about this PR.

zeenix commented 3 months ago

Regarding performance, it's worth noting that on most processor architectures, there is no much difference between an atomic store and a regular non-atomic store (source).

Thanks for the citing a source on this. I'll read it later..

Additionally, Atomic operations are generally not as expensive as mutexes, as mutexes themselves are implemented using atomic operations but with additional overhead for locking.

It would be nice to have some numbers on this but I guess we know for sure that at least atomics are not more expensive. So if atomics are not more expensive than their non-atomic primitive types for most machines out there, there is no downside to using atomics and that's good enough for me.

hozan23 commented 2 months ago

After conducting benchmarks for the new changes, it turns out using atomic operations results in a performance regression of around 10-11%. This regression is likely to increase with more receivers.

Here are the benchmark results on an Intel x86_64 architecture CPU by running cargo bench on the main repo:

Without atomic operations:

1 -> 1                  time:   [118.89 ns 119.26 ns 119.71 ns]
Found 21 outliers among 100 measurements (21.00%)
  3 (3.00%) high mild
  18 (18.00%) high severe

1 -> 2                  time:   [148.34 ns 148.65 ns 149.01 ns]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe

1 -> 4                  time:   [213.03 ns 213.59 ns 214.25 ns]
Found 19 outliers among 100 measurements (19.00%)
  7 (7.00%) high mild
  12 (12.00%) high severe

1 -> 8                  time:   [337.38 ns 337.90 ns 338.52 ns]
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) high mild
  17 (17.00%) high severe

1 -> 16                 time:   [585.31 ns 586.83 ns 588.65 ns]
Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) high mild
  16 (16.00%) high severe

With atomic operations:

1 -> 1                  time:   [121.59 ns 121.96 ns 122.42 ns]
                        change: [+1.3652% +1.9991% +2.5771%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
  20 (20.00%) high severe

1 -> 2                  time:   [158.20 ns 158.62 ns 159.08 ns]
                        change: [+6.2135% +6.5239% +6.8636%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

1 -> 4                  time:   [232.68 ns 233.26 ns 233.93 ns]
                        change: [+9.0008% +9.3453% +9.7053%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

1 -> 8                  time:   [370.89 ns 371.66 ns 372.54 ns]
                        change: [+10.014% +10.440% +10.926%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

1 -> 16                 time:   [653.93 ns 655.80 ns 657.86 ns]
                        change: [+11.406% +11.762% +12.118%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) high mild
  22 (22.00%) high severe

I am closing this PR due to this regression.

zeenix commented 2 months ago

@hozan23 Thanks so much for your hard work still. I certainly learnt things in the course of this PR and now we've a very good answer to "why not allow Receiver to be non-mut?".