Closed joboet closed 5 months ago
Hm, I've never heard that the allocator would be required to establish happens-before relationships when doing address reuse.
The compiler already can't move things that access this memory down below the free or up across the malloc, so it may be sufficient for the allocator to use Relaxed
access to check that the address has been freed, and not actually incur happens-before? Not sure.
Would it help to make these owned
accesses in the ReentrantLock check be release/acquire?
@Amanieu @m-ou-se I'd be curious what your take on this is.
Would it help to make these
owned
accesses in the ReentrantLock check be release/acquire?
No, because those could still load an old value.
The compiler already can't move things that access this memory down below the free or up across the malloc, so it may be sufficient for the allocator to use
Relaxed
access to check that the address has been freed, and not actually incur happens-before? Not sure.
If there is no happens-before, then writes from before the free
could be ordered after the malloc
and cause data races.
As -Zmiri-track-weak-memory-loads shows, the second thread reads an outdated owner value and as with this specific seed the TLS allocation happens to be on the same address, it thinks it already owns the lock; thus triggering UB inside miri.
I am not sure I fully understand what happens. So we're reading an outdated owner
which happens to have the right value but is actually a different thread? And the proper, latest value of owner would be 0
? So if the owner == this_thread
arm had a debug_assert to ensure that the lock count is currently non-zero, that would catch this particular execution (though likely it would not fix the issue)?
I am not sure I fully understand what happens. So we're reading an outdated
owner
which happens to have the right value but is actually a different thread?
Yes, exactly.
So if the
owner == this_thread
arm had a debug_assert to ensure that the lock count is currently non-zero, that would catch this particular execution (though likely it would not fix the issue)?
No, because that read would still race with the write of the count in the other thread (or at least miri would think so).
Is it possible to reproduce this without involving ReentrantLock? I thought something like this would do it but there doesn't seem to be a race here.
use std::thread;
// Just a type with a rare size to make it more likely that we will
// reuse the right allocation.
type T = [u8; 7];
fn thread() {
for _ in 0..100 {
let b = Box::new(T::default());
drop(b);
thread::yield_now();
}
}
fn main() {
let j1 = thread::spawn(thread);
let j2 = thread::spawn(thread);
j1.join().unwrap();
j2.join().unwrap();
}
Ah I guess it's not enough to just reuse the memory, as the data race detector will consider the old and new allocation to have nothing to do with each other. It's about using (abusing?) the address as a signal for synchronization on another shared location.
I am honestly not convinced that your reasoning is sound here, it sounds extremely sketchy to me.
Cc @rust-lang/opsem any opinions on this?
I feel like if we're changing Miri to accommodate this we at least need to also document this as a requirement for our allocator traits. I don't know enough about allocators to judge whether there's a a risk of a real-world allocator violating this property.
In this case it's a thread-local, that wouldn't even go through a regular allocator right? I have no idea how the memory backing thread-locals is managed, some sort of page table magic as part of thread creation/destruction or so? Where would the happens-before come from there?
musl libc either puts TLS on the (mmapped) stack or uses malloc to allocate it.
The kernel would have a happens before between the munmap and mmap, right? It will only reuse a mmap region after it is certain all threads have flushed the TLB entries for the old mapping (which requires synchronization) Same with a memory allocator which won't reuse a memory location until after it has synchronized with the free call.
Yes, this is invariant over all correct memory allocators. In terms of hardware, if the free
call and its internal writes do not present a barrier, they may get ordered before reads to the allocation, which therefore could read data written after the malloc
.
Separately, doesn't address reuse mean that ReentrantLock ownership is transferred to a different thread when a thread exits without releasing the lock? That doesn't seem right either.
It's the same thing: as long as there is a happens-before relationship between the deallocation and allocation, that is totally sound.
Sure, it might be sound but is it safe? It seems wrong that you can have multiple threads locking the same lock without unlocking it first (EDIT: i.e., it is inconsistent with guarantees documented in ReentrantLock::lock
).
free
makes the freed memory available to other threads, it must use a Release
ordering. Otherwise the CPU could reorder writes to the freed memory after it has been made available to other threads.alloc
uses memory that comes from other threads, it must use an Acquire
ordering, for the same reason.ReentrantLock
checking the address of a TLS variable is sufficient for safety: even if the address is reused from another thread, this would mean that this other thread has already exited. The Sync
invariants are maintained since there is no concurrent shared access from multiple threads.@joboet is there a way to extract this into a self-contained testcase that does not rely on ReentrantLock
?
Yes! This one works for any seed and will only succeed if the happens-before relationship is formed:
#![feature(strict_provenance)]
#![feature(sync_unsafe_cell)]
use std::cell::SyncUnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};
use std::thread;
static ADDR: AtomicUsize = AtomicUsize::new(0);
static VAL: SyncUnsafeCell<i32> = SyncUnsafeCell::new(0);
fn addr() -> usize {
let alloc = Box::new(42);
<*const i32>::addr(&*alloc)
}
fn thread1() {
unsafe {
VAL.get().write(42);
}
let alloc = addr();
ADDR.store(alloc, Relaxed);
}
fn thread2() -> bool {
// We try to get an allocation at the same address. If we fail too often,
// try again with a different allocation to ensure it is still in miris
// reuse cache.
for _ in 0..16 {
let alloc = addr();
let addr = ADDR.load(Relaxed);
if alloc == addr {
// If the new allocation is at the same address as the old one,
// there must be a happens-before relationship between them.
// Therefore, we can read VAL without racing and must observe
// the write above.
let val = unsafe { VAL.get().read() };
assert_eq!(val, 42);
return true;
}
}
false
}
fn main() {
let mut success = false;
while !success {
let t1 = thread::spawn(thread1);
let t2 = thread::spawn(thread2);
t1.join().unwrap();
success = t2.join().unwrap();
// Reset everything.
ADDR.store(0, Relaxed);
unsafe { VAL.get().write(0); }
}
}
Since C11, per cppreference:
A previous call to
free
orrealloc
that deallocates a region of memory synchronizes-with a call tomalloc
that allocates the same or a part of the same region of memory. This synchronization occurs after any access to the memory by the deallocating function and before any access to the memory bymalloc
. There is a single total order of all allocation and deallocation functions operating on each particular region of memory.
synchronizes-with is the relation from a release store to an acquire load; the malloc
family is mandated to provide (at least) AcqRel synchronization. $A$ synchronizes-with $B$ implies $A$ happens-before $B$.
This doesn't mean that __rust_alloc
is required to provide this guarantee, but it does imply LLVM could assume it does (although I don't think it'd ever make a derivation that isn't already justified by disjoint provenance) and that code authors are likely to assume it holds.
However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live to have the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).
Since C11, per cppreference:
Thanks! That's useful.
So yeah Miri should implement that then. But also IMO our allocator traits should document this.
However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live having the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).
Ah indeed this entire approach that ReentrantLock takes is kind of incompatible with LLVM optimizations that reduce memory usage. That's a separate problem, but still a challenging one. Would people rather be able to optimize away malloc
(and other allocations) that turn out to not be needed or use uniqueness of addresses as correctness argument like in ReentrantLock
? We can't have both... well I guess we can if we annotate the allocations whose uniqueness this uses as "must not be optimized away", and then use that as a clue to make sure two such allocations never overlap. (They can still arbitrarily overlap with other allocations.)
However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live having the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).
Ah indeed this entire approach that ReentrantLock takes is kind of incompatible with LLVM optimizations that reduce memory usage. That's a separate problem, but still a challenging one. Would people rather be able to optimize away
malloc
(and other allocations) that turn out to not be needed or use uniqueness of addresses as correctness argument like inReentrantLock
? We can't have both... well I guess we can if we annotate the allocations whose uniqueness this uses as "must not be optimized away", and then use that as a clue to make sure two such allocations never overlap.
I believe we were already considering something like this in the model, some kind of "plane 0" used for external allocations and such, and which we force the usage of once the user starts doing difficult-to-analyze things with pointers. (We called it "exposed" but this meaning is overloaded, I think... I don't think it's necessarily the same as the strict-provenance "expose" but there are some relationships between them.) My guess is that ReentrantLock is either already doing something that would imply the use of this allocation plane, or could be modified to do so once we have enough details hammered out to suggest actual code changes.
FWIW my interpretation was that it is exactly the same "exposed".
But anyway, let's continue in https://github.com/rust-lang/unsafe-code-guidelines/issues/328.
The following code fails under miri:
Commit:
e38a828bb8c1faa95be23d1fa902913d218de848
Command:./miri run -Zmiri-seed=706 -Zmiri-track-weak-memory-loads -Zmiri-preemption-rate=0 race.rs
Result:ReentrantLock
uses the address of a TLS variable to check whether the current thread already owns the lock. As-Zmiri-track-weak-memory-loads
shows, the second thread reads an outdatedowner
value and as with this specific seed the TLS allocation happens to be on the same address, it thinks it already owns the lock; thus triggering UB inside miri.However, I do not believe this to be an issue with
ReentrantLock
. In any allocator, reallocating the same address requires observing that the address has been deallocated. Thus, there must be a happens-before relationship between deallocation and allocation. miri currently does not form such a relationship, leading to the issue above. With happens-before, the outdated read cannot happen because the second thread will observe that the owner has been reset and released after allocating the TLS variable.