sync and unsync variants (starting with semaphore)

tedsta commented 4 months ago

Hi there - first off, thanks for this crate!

I found the Semaphore in this crate is ~35% faster on my x86 laptop in a ping-pong test if I swap the Mutex for an UnsafeCell to make it a thread-local semaphore.

    let semaphore_a = Semaphore::new(0);
    let semaphore_b = Semaphore::new(0);

    let iter_count = 1000;
    c.bench_function("local-semaphore x1000", |b| {
        b.iter(|| {
            pollster::block_on(futures::future::join(async {
                for _ in 0..iter_count {
                    semaphore_a.release(1);
                    semaphore_b.acquire(1).await;
                }
            }, async {
                for _ in 0..iter_count {
                    semaphore_a.acquire(1).await;
                    semaphore_b.release(1);
                }
            }));
        })
    });

Would you be open to a pull request adding a LocalSemaphore and some benchmarks comparing different async semaphores?

To avoid code duplication, I can try using the lock_api with an internal-only NoopMutex like futures-intrusive does. Let me know what you think.

futures-intrusive unfair x1000                        
                        time:   [80.350 µs 83.518 µs 87.210 µs]
                        change: [+3.3268% +5.7442% +8.6618%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

local-semaphore x1000   time:   [55.697 µs 55.934 µs 56.212 µs]                                  
                        change: [-1.8923% +0.0188% +1.9563%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  7 (7.00%) high severe
^^^^^^^ this is local version of no-std-async Semaphore**

no-std-async x1000      time:   [84.970 µs 85.345 µs 85.782 µs]                               
                        change: [-1.9186% +0.0209% +1.9904%] (p = 0.98 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

async-unsync x1000      time:   [73.366 µs 73.817 µs 74.376 µs]                               
                        change: [-4.2150% -2.8097% -1.4084%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

tokio x1000             time:   [175.73 µs 176.67 µs 177.81 µs]                        
                        change: [-11.026% -8.4840% -6.0201%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  7 (7.00%) high severe

lylythechosenone commented 4 months ago

What would be the purpose of a thread-local async semaphore?

tedsta commented 4 months ago

One example use-case: building block for thread-local multi-producer multi-consumer async channel.

edit - a more concrete use-case: imagine that you are using a non-work-stealing executor and spawn a task to handle each incoming request, and you want to limit the number of active concurrent tasks while still allowing for a backlog of requests before you start dropping requests.

That said, no hard feelings if it doesn't fit in this crate. I can toss it in a new local-semaphore crate and credit this crate.

lylythechosenone commented 4 months ago

Alright, I can see how that could be useful. I suppose I could probably get that in. I'm not sure I have lots of time at the moment, so you might have to wait a bit. If you could implement it yourself and submit a PR, I could review that sooner.

P.S. thanks for your interest in this crate—I'm surprised it has any users at all. I mostly made it for my own use.

tedsta commented 4 months ago

Sweet, I'll try to get you a PR in the next couple days.

I was going to write a local semaphore using the pin-list crate, then thought "surely someone has done this already". Looked at dependents of pin-list, and here we are :smile:

lylythechosenone / no-std-async

sync and unsync variants (starting with semaphore) #2