matklad / once_cell

Rust library for single assignment cells and lazy statics without macros
Apache License 2.0
1.87k stars 109 forks source link

Unexpected deadlock on Windows 32bits when replacing Mutex insde OnceCell #93

Closed Jonathas-Conceicao closed 4 years ago

Jonathas-Conceicao commented 4 years ago

I have in an application I work on a static shared reference to some data, at some points given a condition I have to replace the shared data content. Access to the data is coordinated between threads using a Mutex. I have separated a small example that does result in the weird behavior, please check the following example:

// [dependencies]
// once_cell = "1"

use once_cell::sync::OnceCell;
use std::{
    mem,
    sync::{Mutex, MutexGuard},
};

static mut SHARED: OnceCell<Mutex<usize>> = OnceCell::new();

fn get() -> MutexGuard<'static, usize> {
    println!("Getting data");
    let init_data = || Mutex::new(0);

    let data = unsafe { SHARED.get_or_init(init_data).lock().unwrap() };
    if *data < 9 {
        return data;
    }
    println!("Replacing data");

    let data = unsafe { SHARED.get_mut().unwrap() };
    mem::replace(data, init_data());
    data.lock().unwrap()
}

fn main() {
    for _ in 1..10 {
        let mut data = get();
        println!("Got data: {}", *data);
        *data += 5;
    }
}

Building and running this on Linux (target _stable-x8664-unknown-linux-gnu) gives no error. However, building on a Windows 32bits (target stable-i686-pc-windows-msvc) this code results in a dead lock when the shared reference is replaced and we try to acquire the lock on the next call to get so the application gives this output on Windows alone:

Getting data
Got data: 0
Getting data
Got data: 5
Getting data
Replacing data
Got data: 0
Getting data
<<application gets blocked here indefinitely>>

Somethings I have tested so far:

fn get() -> MutexGuard<'static, usize> { println!("Getting data"); let init_data = || Mutex::new(0);

let data = unsafe {
    SHARED.replace(init_data());
SHARED.as_mut().unwrap().lock().unwrap()
};
if *data < 9 {
    return data;
}
println!("Replacing data");

let data = unsafe { SHARED.as_mut().unwrap() };
mem::replace(data, init_data());
data.lock().unwrap()

}



All this last item leads me to believe that the problem might be in the `OnceCell` or in the way I'm using it. Suggestions on how to further find the source of the problem is also welcome.
matklad commented 4 years ago

I am pretty sure that the code contains several instances of UB.

Specifically, on line let data = unsafe { SHARED.get_mut().unwrap() }; we drop the previous mutex, while there's an outstanding guard, pointing to the mutex (the previous value of data variable).

The Mutex-based code that seems to work only does that because everything after if *data < 9 if is a dead code: the *data is always 0, as we alwaos rewrite the mutex. If make the condition to trigger by changing to et init_data = || Mutex::new(10);, I get the following miri error:

Getting data
Replacing data
error: Miri evaluation error: trying to reborrow for SharedReadWrite, but parent tag <3669> does not have an appropriate item in the borrow stack
   --> /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sync/mutex.rs:444:13
    |
444 |             self.lock.poison.done(&self.poison);
    |             ^^^^^^^^^^^^^^^^ trying to reborrow for SharedReadWrite, but parent tag <3669> does not have an appropriate item in the borrow stack
    |
    = note: inside call to `<std::sync::MutexGuard<usize> as std::ops::Drop>::drop` at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:174:1
note: inside call to `std::intrinsics::drop_in_place::<std::sync::MutexGuard<usize>> - shim(Some(std::sync::MutexGuard<usize>))` at src/main.rs:22:1
   --> src/main.rs:22:1
    |
22  | }
    | ^
note: inside call to `get` at src/main.rs:26:24
   --> src/main.rs:26:24
    |
26  |         let mut data = get();
    |                        ^^^^^
    = note: inside call to `main` at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:34
    = note: inside call to closure at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:73
    = note: inside call to closure at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:129:5
    = note: inside call to `std::sys_common::backtrace::__rust_begin_short_backtrace::<[closure@DefId(1:6020 ~ std[d6a9]::rt[0]::lang_start_internal[0]::{{closure}}[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:13
    = note: inside call to closure at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:303:40
    = note: inside call to `std::panicking::r#try::do_call::<[closure@DefId(1:6019 ~ std[d6a9]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:281:13
    = note: inside call to `std::panicking::r#try::<i32, [closure@DefId(1:6019 ~ std[d6a9]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe]>` at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:14
    = note: inside call to `std::panic::catch_unwind::<[closure@DefId(1:6019 ~ std[d6a9]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:51:25
    = note: inside call to `std::rt::lang_start_internal` at /home/matklad/.rustup/toolchains/nightly-2020-02-08-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:5
    = note: inside call to `std::rt::lang_start::<()>`

error: aborting due to previous error

error: could not compile `oc`.

To learn more, run the command again with --verbose.

Additionally,

    let data = unsafe {
        SHARED.replace(init_data());
    SHARED.as_mut().unwrap().lock().unwrap()
    };

is also unsound: if get is called concurrently from two threads (and nothing prevents that), this will be a data race on the SHARED location.

matklad commented 4 years ago

would smth like this work for you?

use once_cell::sync::Lazy;

use std::sync::{Mutex, MutexGuard};

static SHARED: Lazy<Mutex<usize>> = Lazy::new(|| Mutex::new(0));

fn get() -> MutexGuard<'static, usize> {
    let mut res = SHARED.lock().unwrap();
    if *res >= 9 {
        *res = 0;
    }
    res
}

fn main() {
    for _ in 1..10 {
        let mut data = get();
        println!("Got data: {}", *data);
        *data += 5;
    }
}
Jonathas-Conceicao commented 4 years ago

So the problem was the drop of the replaced Mutex alongside it's MutexGuard that caused the undefined behavior, that makes sense to me, thanks for the input on this.

The example you provided with Lazy works for me, and it does seems way better to update the content of the Mutexthan trying to replace the hole thing.

Thank you for the help and the quick response.