Reusing allocation addresses should imply a happens-before relationship

joboet commented 6 months ago

The following code fails under miri:

#![feature(reentrant_lock)]

use std::thread;
use std::sync::ReentrantLock;

fn main() {
    let lock = ReentrantLock::new(());
    let acquire = || drop(lock.lock());

    thread::scope(|s| {
        s.spawn(acquire);
        s.spawn(acquire);
    });
}

Commit: e38a828bb8c1faa95be23d1fa902913d218de848 Command: ./miri run -Zmiri-seed=706 -Zmiri-track-weak-memory-loads -Zmiri-preemption-rate=0 race.rs Result:

note: tracking was triggered
   --> /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:187:16
    |
187 |             if self.owner.load(Relaxed) == this_thread {
    |                ^^^^^^^^^^^^^^^^^^^^^^^^ weak memory emulation: outdated value returned from load
    |
    = note: BACKTRACE on thread `unnamed-2`:
    = note: inside `std::sync::ReentrantLock::<()>::lock` at /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:187:16: 187:40
note: inside closure
   --> race.rs:8:27
    |
8   |     let acquire = || drop(lock.lock());
    |                           ^^^^^^^^^^^

error: Undefined Behavior: Data race detected between (1) non-atomic write on thread `unnamed-1` and (2) non-atomic read on thread `unnamed-2` at alloc1432+0x10. (2) just happened here
   --> /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:244:34
    |
244 |         *self.lock_count.get() = (*self.lock_count.get()).checked_add(1)?;
    |                                  ^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) non-atomic write on thread `unnamed-1` and (2) non-atomic read on thread `unnamed-2` at alloc1432+0x10. (2) just happened here
    |
help: and (1) occurred earlier here
   --> race.rs:8:22
    |
8   |     let acquire = || drop(lock.lock());
    |                      ^^^^^^^^^^^^^^^^^
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE (of the first span) on thread `unnamed-2`:
    = note: inside `std::sync::ReentrantLock::<()>::increment_lock_count` at /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:244:34: 244:58
    = note: inside `std::sync::ReentrantLock::<()>::lock` at /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:188:17: 188:44
note: inside closure
   --> race.rs:8:27
    |
8   |     let acquire = || drop(lock.lock());
    |                           ^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

Error: command exited with non-zero code `cargo +miri --quiet run --manifest-path /Users/Jonas/src/miri/Cargo.toml -- -Zmiri-seed=706 -Zmiri-track-weak-memory-loads -Zmiri-preemption-rate=0 race.rs --edition=2021`: 1

ReentrantLock uses the address of a TLS variable to check whether the current thread already owns the lock. As -Zmiri-track-weak-memory-loads shows, the second thread reads an outdated owner value and as with this specific seed the TLS allocation happens to be on the same address, it thinks it already owns the lock; thus triggering UB inside miri.

However, I do not believe this to be an issue with ReentrantLock. In any allocator, reallocating the same address requires observing that the address has been deallocated. Thus, there must be a happens-before relationship between deallocation and allocation. miri currently does not form such a relationship, leading to the issue above. With happens-before, the outdated read cannot happen because the second thread will observe that the owner has been reset and released after allocating the TLS variable.

RalfJung commented 6 months ago

Hm, I've never heard that the allocator would be required to establish happens-before relationships when doing address reuse.

The compiler already can't move things that access this memory down below the free or up across the malloc, so it may be sufficient for the allocator to use Relaxed access to check that the address has been freed, and not actually incur happens-before? Not sure.

Would it help to make these owned accesses in the ReentrantLock check be release/acquire?

@Amanieu @m-ou-se I'd be curious what your take on this is.

joboet commented 6 months ago

Would it help to make these owned accesses in the ReentrantLock check be release/acquire?

No, because those could still load an old value.

The compiler already can't move things that access this memory down below the free or up across the malloc, so it may be sufficient for the allocator to use Relaxed access to check that the address has been freed, and not actually incur happens-before? Not sure.

If there is no happens-before, then writes from before the free could be ordered after the malloc and cause data races.

RalfJung commented 6 months ago

As -Zmiri-track-weak-memory-loads shows, the second thread reads an outdated owner value and as with this specific seed the TLS allocation happens to be on the same address, it thinks it already owns the lock; thus triggering UB inside miri.

I am not sure I fully understand what happens. So we're reading an outdated owner which happens to have the right value but is actually a different thread? And the proper, latest value of owner would be 0? So if the owner == this_thread arm had a debug_assert to ensure that the lock count is currently non-zero, that would catch this particular execution (though likely it would not fix the issue)?

joboet commented 6 months ago

I am not sure I fully understand what happens. So we're reading an outdated owner which happens to have the right value but is actually a different thread?

Yes, exactly.

So if the owner == this_thread arm had a debug_assert to ensure that the lock count is currently non-zero, that would catch this particular execution (though likely it would not fix the issue)?

No, because that read would still race with the write of the count in the other thread (or at least miri would think so).

RalfJung commented 6 months ago

Is it possible to reproduce this without involving ReentrantLock? I thought something like this would do it but there doesn't seem to be a race here.

use std::thread;

// Just a type with a rare size to make it more likely that we will
// reuse the right allocation.
type T = [u8; 7];

fn thread() {
    for _ in 0..100 {
        let b = Box::new(T::default());
        drop(b);
        thread::yield_now();
    }
}

fn main() {
    let j1 = thread::spawn(thread);
    let j2 = thread::spawn(thread);
    j1.join().unwrap();
    j2.join().unwrap();
}

RalfJung commented 6 months ago

Ah I guess it's not enough to just reuse the memory, as the data race detector will consider the old and new allocation to have nothing to do with each other. It's about using (abusing?) the address as a signal for synchronization on another shared location.

I am honestly not convinced that your reasoning is sound here, it sounds extremely sketchy to me.

RalfJung commented 6 months ago

Cc @rust-lang/opsem any opinions on this?

I feel like if we're changing Miri to accommodate this we at least need to also document this as a requirement for our allocator traits. I don't know enough about allocators to judge whether there's a a risk of a real-world allocator violating this property.

RalfJung commented 6 months ago

In this case it's a thread-local, that wouldn't even go through a regular allocator right? I have no idea how the memory backing thread-locals is managed, some sort of page table magic as part of thread creation/destruction or so? Where would the happens-before come from there?

bjorn3 commented 6 months ago

musl libc either puts TLS on the (mmapped) stack or uses malloc to allocate it.

bjorn3 commented 6 months ago

The kernel would have a happens before between the munmap and mmap, right? It will only reuse a mmap region after it is certain all threads have flushed the TLB entries for the old mapping (which requires synchronization) Same with a memory allocator which won't reuse a memory location until after it has synchronized with the free call.

joboet commented 6 months ago

Yes, this is invariant over all correct memory allocators. In terms of hardware, if the free call and its internal writes do not present a barrier, they may get ordered before reads to the allocation, which therefore could read data written after the malloc.

tmiasko commented 6 months ago

Separately, doesn't address reuse mean that ReentrantLock ownership is transferred to a different thread when a thread exits without releasing the lock? That doesn't seem right either.

joboet commented 6 months ago

It's the same thing: as long as there is a happens-before relationship between the deallocation and allocation, that is totally sound.

tmiasko commented 6 months ago

Sure, it might be sound but is it safe? It seems wrong that you can have multiple threads locking the same lock without unlocking it first (EDIT: i.e., it is inconsistent with guarantees documented in ReentrantLock::lock).

Amanieu commented 6 months ago

If free makes the freed memory available to other threads, it must use a Release ordering. Otherwise the CPU could reorder writes to the freed memory after it has been made available to other threads.
If alloc uses memory that comes from other threads, it must use an Acquire ordering, for the same reason.
(Acquire/release is not needed for a thread-local allocator, but in that case it cannot lead to a memory address being re-used across threads)
ReentrantLock checking the address of a TLS variable is sufficient for safety: even if the address is reused from another thread, this would mean that this other thread has already exited. The Sync invariants are maintained since there is no concurrent shared access from multiple threads.

RalfJung commented 6 months ago

@joboet is there a way to extract this into a self-contained testcase that does not rely on ReentrantLock?

joboet commented 6 months ago

Yes! This one works for any seed and will only succeed if the happens-before relationship is formed:

#![feature(strict_provenance)]
#![feature(sync_unsafe_cell)]

use std::cell::SyncUnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};
use std::thread;

static ADDR: AtomicUsize = AtomicUsize::new(0);
static VAL: SyncUnsafeCell<i32> = SyncUnsafeCell::new(0);

fn addr() -> usize {
    let alloc = Box::new(42);
    <*const i32>::addr(&*alloc)
}

fn thread1() {
    unsafe {
        VAL.get().write(42);
    }
    let alloc = addr();
    ADDR.store(alloc, Relaxed);
}

fn thread2() -> bool {
    // We try to get an allocation at the same address. If we fail too often,
    // try again with a different allocation to ensure it is still in miris
    // reuse cache.
    for _ in 0..16 {
        let alloc = addr();
        let addr = ADDR.load(Relaxed);
        if alloc == addr {
            // If the new allocation is at the same address as the old one,
            // there must be a happens-before relationship between them.
            // Therefore, we can read VAL without racing and must observe
            // the write above.
            let val = unsafe { VAL.get().read() };
            assert_eq!(val, 42);
            return true;
        }
    }

    false
}

fn main() {
    let mut success = false;
    while !success {
        let t1 = thread::spawn(thread1);
        let t2 = thread::spawn(thread2);
        t1.join().unwrap();
        success = t2.join().unwrap();

        // Reset everything.
        ADDR.store(0, Relaxed);
        unsafe { VAL.get().write(0); }
    }
}

CAD97 commented 6 months ago

Since C11, per cppreference:

A previous call to free or realloc that deallocates a region of memory synchronizes-with a call to malloc that allocates the same or a part of the same region of memory. This synchronization occurs after any access to the memory by the deallocating function and before any access to the memory by malloc. There is a single total order of all allocation and deallocation functions operating on each particular region of memory.

synchronizes-with is the relation from a release store to an acquire load; the malloc family is mandated to provide (at least) AcqRel synchronization. $A$ synchronizes-with $B$ implies $A$ happens-before $B$.

This doesn't mean that __rust_alloc is required to provide this guarantee, but it does imply LLVM could assume it does (although I don't think it'd ever make a derivation that isn't already justified by disjoint provenance) and that code authors are likely to assume it holds.

However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live to have the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).

RalfJung commented 6 months ago

Since C11, per cppreference:

Thanks! That's useful.

So yeah Miri should implement that then. But also IMO our allocator traits should document this.

However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live having the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).

Ah indeed this entire approach that ReentrantLock takes is kind of incompatible with LLVM optimizations that reduce memory usage. That's a separate problem, but still a challenging one. Would people rather be able to optimize away malloc (and other allocations) that turn out to not be needed or use uniqueness of addresses as correctness argument like in ReentrantLock? We can't have both... well I guess we can if we annotate the allocations whose uniqueness this uses as "must not be optimized away", and then use that as a clue to make sure two such allocations never overlap. (They can still arbitrarily overlap with other allocations.)

digama0 commented 6 months ago

However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live having the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).

Ah indeed this entire approach that ReentrantLock takes is kind of incompatible with LLVM optimizations that reduce memory usage. That's a separate problem, but still a challenging one. Would people rather be able to optimize away malloc (and other allocations) that turn out to not be needed or use uniqueness of addresses as correctness argument like in ReentrantLock? We can't have both... well I guess we can if we annotate the allocations whose uniqueness this uses as "must not be optimized away", and then use that as a clue to make sure two such allocations never overlap.

I believe we were already considering something like this in the model, some kind of "plane 0" used for external allocations and such, and which we force the usage of once the user starts doing difficult-to-analyze things with pointers. (We called it "exposed" but this meaning is overloaded, I think... I don't think it's necessarily the same as the strict-provenance "expose" but there are some relationships between them.) My guess is that ReentrantLock is either already doing something that would imply the use of this allocation plane, or could be modified to do so once we have enough details hammered out to suggest actual code changes.

RalfJung commented 6 months ago

FWIW my interpretation was that it is exactly the same "exposed".

But anyway, let's continue in https://github.com/rust-lang/unsafe-code-guidelines/issues/328.

rust-lang / miri

Reusing allocation addresses should imply a happens-before relationship #3450