smol-rs / async-lock

Async synchronization primitives
Apache License 2.0
255 stars 28 forks source link

RwLock.write() and Mutex.lock() hang forever #91

Open nazar-pc opened 1 month ago

nazar-pc commented 1 month ago

I have a piece of code where rwlock.write() hangs in one of the threads forever.

There are other threads that are calling both rwlock.write() and rwlock.read() on the same instance (behind Arc) and they work fine, it is just this single thread that is stuck.

I do not have a clean or minimal reproduction, but so far I am able to reproduce it in a larger app, even though it sometimes takes hours to do so.

Might be related to https://github.com/smol-rs/async-lock/pull/80#issuecomment-1975145930

Using async-lock 3.4.0 and tokio 1.39.2.

Open for debugging suggestions, really not sure where to go next from here.

nazar-pc commented 1 month ago

Reduced reproduction based on application behavior:

use async_lock::RwLock;
use std::future::Future;
use std::pin::pin;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let rw = RwLock::new(());

    let read_guard = rw.read().await;

    let mut write_a_fut = pin!(rw.write());
    let mut write_b_fut = pin!(rw.write());

    poll_fn(|ctx| {
        if !matches!(write_b_fut.as_mut().poll(ctx), Poll::Pending) {
            unreachable!("");
        }
        if !matches!(write_a_fut.as_mut().poll(ctx), Poll::Pending) {
            unreachable!("");
        }

        Poll::Ready(())
    })
    .await;

    drop(read_guard);

    write_a_fut.await;
}

Essentially I have two write lock attempts which I try to poll once and then even after read lock being dropped, write can't be acquired.

Note the order of write_a_fut and write_b_fut polling, when reversed it succeeds.

nazar-pc commented 1 month ago

cc @taiki-e and @notgull since you were active on https://github.com/smol-rs/async-lock/pull/80 and https://github.com/smol-rs/event-listener/issues/123

nazar-pc commented 1 month ago

Checked versions down to 2.3.0 and they are all also problematic in the same way

nazar-pc commented 1 month ago

The same happens with Mutex actually

notgull commented 1 month ago

Interesting! Let me investigate this.

nazar-pc commented 1 month ago

Probably related to https://github.com/smol-rs/event-listener/issues/143 as well, I have event-listener in the dependencies and even without async-lock I still have issues, so the root cause is likely in event-listener itself.

nazar-pc commented 1 month ago

@notgull I can imagine you're probably busy as is, but in case you have an ETA when you might have time to look into it, it'd help me plan things as well. If not I'll understand that too.

notgull commented 1 month ago

@nazar-pc Unfortunately no, smol is a free-time project for me and I haven't had a lot of free time as of late.

nazar-pc commented 1 month ago

I can do more debugging myself as well given some pointers. I tried RwLock first with step by step debugger and it had quite a few layers, I might give it another try with Mutex that should be more straightforward implementation-wise.

nazar-pc commented 1 month ago

Here is a further reduced example with just event-listener:

use event_listener::Event;
use std::future::Future;
use std::pin::pin;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let event = Event::new();

    let mut listen_a = pin!(event.listen());
    let mut listen_b = pin!(event.listen());

    poll_fn(|ctx| {
        assert_eq!(listen_b.as_mut().poll(ctx), Poll::Pending);
        assert_eq!(listen_a.as_mut().poll(ctx), Poll::Pending);

        Poll::Ready(())
    })
    .await;

    drop(listen_a);

    event.notify(1);
    listen_b.await;
}

And clippy is happy to tell why:

warning: call to `std::mem::drop` with a value that does not implement `Drop`. Dropping such a type only extends its contained lifetimes
  --> main.rs:22:5
   |
22 |     drop(listen_a);
   |     ^^^^^^^^^^^^^^
   |
note: argument has type `std::pin::Pin<&mut event_listener::EventListener>`
  --> main.rs:22:10
   |
22 |     drop(listen_a);
   |          ^^^^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#drop_non_drop
   = note: `#[warn(clippy::drop_non_drop)]` on by default

There should be impl Drop for EventListener that will unregister listener for things to get un-stuck.

nazar-pc commented 1 month ago

Even shorter:

use event_listener::Event;
use std::pin::pin;

#[tokio::main]
async fn main() {
    let event = Event::new();

    let listen_a = pin!(event.listen());
    let listen_b = pin!(event.listen());

    drop(listen_a);

    event.notify(1);
    listen_b.await;
}

Interestingly, removing pin!() allows it to make progress (might be compiler optimization or something).

nazar-pc commented 1 month ago

Closing in favor of https://github.com/smol-rs/event-listener/issues/143

nazar-pc commented 1 month ago

Here is fixed example without problematic pin!() that prevents Drop from working:

use async_lock::RwLock;
use std::future::Future;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let rw = RwLock::new(());

    let read_guard = rw.read().await;

    let mut write_a_fut = Box::pin(rw.write());
    let mut write_b_fut = Box::pin(rw.write());

    poll_fn(|ctx| {
        assert!(matches!(write_b_fut.as_mut().poll(ctx), Poll::Pending));
        assert!(matches!(write_a_fut.as_mut().poll(ctx), Poll::Pending));

        Poll::Ready(())
    })
    .await;

    drop(read_guard);

    write_a_fut.await;
}

And similar with Mutex:

use async_lock::Mutex;
use std::future::Future;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let m = Mutex::new(());

    let guard_a = m.lock().await;

    let mut guard_b = Box::pin(m.lock());
    let mut guard_c = Box::pin(m.lock());

    poll_fn(|ctx| {
        assert!(matches!(guard_c.as_mut().poll(ctx), Poll::Pending));
        assert!(matches!(guard_b.as_mut().poll(ctx), Poll::Pending));

        Poll::Ready(())
    })
    .await;

    drop(guard_a);

    guard_b.await;
}

This basically means there is an internal queue that defines an order in which things are supposed to be unlocked determined by the first .poll() call. If after that order changes, it may deadlock.

I'm not 100% sure whether it is expected or not. The solution would be to notify all instead of one, but that will likely have a performance impact, which may or may not be acceptable.

UPD: Though I suspect there is still another issue somewhere that I can't reproduce just yet.

nazar-pc commented 1 month ago

Feel free to close this issue if above behavior is expected