Closed RalfJung closed 2 years ago
Potential fixes:
fetch_sub
that takes a raw pointer, and use that.UnsafeCell
that opts-out of the dereferenceable
attribute, maybe even make that the default behavior for UnsafeCell
. Also see https://github.com/rust-rfcs/unsafe-code-guidelines/issues/33 for another use-case for a non-dereferenceable
shared ref.Turns out Clang has a similar issue, it uses dereferenceable
for C++ references and that's not correct because the pointee might get deallocated during the course of the function. Hence another possible fix might be to use dereferenceable_on_entry
instead of dereferenceable
for &UnsafeCell<T>
once that attribute exists.
It's a little spooky to see a situation where a region of code could do something undefined without an unsafe
block. However, as the 'Nomocon notes, this is to be expected when working around unsafe code. A soundly designed library will encapsulate this potential unsafety, but it's not possible or even desirable to protect closely neighboring safe code from the potential of UB.
Anyway, in this twilight zone dangling pointers are not prohibited. Dereferencing dangling pointers is. Exposing somebody else's unsuspecting code to a dangling pointer is also very forbidden. But it's fine for dangling pointers to exist as long as they are dead.
Remember: "dead" means "will not be accessed by anything below this point." Other languages don't even have a concept of reference lifetimes.
Once the current thread (Thread A) has decremented the reference count, Thread B could come in and free the ArcInner.
That can happen. But if it does, Thread A is guaranteed to fetch a reference count greater than 1, which then causes Arc::drop
to return early. self
dangles but it isn't dereferenced while dangling, so no problem.
It would be undefined behavior for any method to be called after drop
. But we also know that this won't happen, because "drop is the last method call" is one of the invariants.
Honestly, I think you're a little too attached to the "Stacked Borrows" model - this is a situation where it doesn't work and rather than trying to understand why the model should be changed, you're arguing that the source code of stdlib should be declared unsound. If Rust continues in the tradition of LLVM, the uniqueness of &mut
really means something like "I solemnly swear that the addresses derived from this reference will not be derived by any other means until after the last use of this reference."
The immediate problem is not about Rust, safe or unsafe, or any proposed formal semantics for it. rustc emits IR that has technically undefined behavior, because there is a pointer that is claimed to be dereferenceable
for the duration of a function but points to memory that is (potentially) deallocated during the execution of the function. That this is not a Rust-specific problem can also be seen by the fact that Clang ('s C++ mode) is affected by it too, as linked earlier. This specific instance is not terribly likely to cause miscompilations, but others are, and in any case "UB that's unlikely to cause problems in practice" is not a good way to build a reliable language implementation.
Aside from that, there is the task of justifying the LLVM IR that rustc emits with Rust-level rules -- at minimum, in every case where emitted LLVM IR has UB, the input Rust code also needs to have UB to be able to claim rustc is correct. This issue points out a case where previous attempts at justifying the dereferenceable
attribute that rustc emits are not sufficient. There are multiple possible remedies for that, none of which are about stacked borrows and most of which modify how rustc emits IR (emitting the attribute in fewer cases, or emitting a weaker attribute instead in some cases) to make the existing code well-defined without modification.
Option one:
- Provide a version of
fetch_sub
that takes a raw pointer, and use that.
for resolving this bug was tried in here: https://github.com/rust-lang/rust/pull/58611
I'm nominating this issue -- and tagging it for @rust-lang/lang -- in response to some comments by @jamesmunns in today's Project Leadership Sync meeting. In particular, @jamesmunns pointed out that this issue around the deferenceable attribute has potentially large repercussions in the embedded space, and cited @japaric's RFC as well as a number of issues around the embedded space (see the minutes).
I've got to dig a bit deeper here but it seems like it'd be good to check-in on the current thinking here. I will try to catch up on the RFC and some of the proposed changes and see if I can post any notes.
Without having done so, I would say that, as far as I know, we still want to keep the dereferenceable attribute, so I expect some form of these changes will be required.
(I suppose omething like the [dereferencable on entry proposal(https://lists.llvm.org/pipermail/llvm-dev/2018-July/124555.html) might also be good, does anyone know if that's made progress? I'm going to assume not.)
@nikomatsakis I have done some work around this. But actually that extended visible API surface by one method. It resolves the problem at a cost of that.
I could be wrong, but it seems like LLVM now supports nofree
and nosync
function attributes, which would solve the issue. https://lists.llvm.org/pipermail/llvm-dev/2018-July/124555.html
nofree
and nosync
exist now, but as far as I can tell the switch from
dereferenceable
applies through the entire function
to
dereferenceable
applies on function entry (usenofree
andnosync
information to recover analysis power)
has not yet happened.
Ah so the tentative plan in LLVM now is to weaken dereferencable
semantics instead of adding another attribute?
That seems bad though. Quoting:
I'd like to propose adding a nofree function attribute to indicate that a function does not, directly or indirectly, call a memory-deallocation function (e.g., free, C++'s operator delete).
So, e.g., a function that takes a reference and also drops an unrelated Box
could not express, with its attributes, that the reference will not get deallocated.
Maybe some from the Rust side should chime in an raise this as a concern?
I forgot about this before but there's a (not yet committed) patch introducing a new attribute for "dereferencable forever" at https://reviews.llvm.org/D61652
See https://github.com/rust-lang/rust/issues/65796 for another instance: there's a call to AtomicBool::store
there signalling another thread that it can remove this memory; that deallocation might happen while AtomicBool::store
still runs with its &self
argument.
I forgot about this before but there's a (not yet committed) patch introducing a new attribute for "dereferencable forever" at https://reviews.llvm.org/D61652
Uh-oh, I just read over that and if I didn't miss anything that new attribute is useless for us. It means "this pointer value is dereferencable for the remaining execution of the program", which is not the case for references.
@RalfJung my reading of the dialogue on https://reviews.llvm.org/D61652 is that LLVM still plans to eventually add no-free/no-sync that will (hopefully?) allow us to fix the normal deferenceable
attribute, and semi-orthogonally to that they are also adding the deferenceable_globally
attribute.
That is, the dereferencable_globally
attribute is not replacing derefencable
(perhaps the latter will even be alpha-renamed to dereferencable_locally
?), so we should not need to raise a stink about it regressing peformance for us, no?
Update: Ah, I missed the detail that @RalfJung pointed out that a nofree
attribute on a function is not necessarily expressive enough. Still, I do not think that contradicts my overall point that we can perhaps just ignore what they're doing with dereferencable_globally
? Unless our general belief is that it is the wrong model for them to adopt even for their own purposes.
Update 2: Discussions with @rkruppe on zulip #wg-llvm led me to conclude that @RalfJung was not wrong to raise a concern on the thread about dereferencable_globally
, since that seems to be the place where LLVM developers are currently discussing the different usage scenarios for various dererencable
semantics.
We can still emit dereferencable
with its new semantics, true. But it is much weaker than what we do currently, and there is no replacement. We can't use no-free/no-sync as such global statements ("this function will not deallocate anything") are not things our type system provides to us.
On a related note, @cuviper recently tracked down some seg-faults in rayon and it came down to this same basic pattern. In particular, as @cuviper explains,
I think we have the same issue here:
impl<'a, L: Latch> Latch for TickleLatch<'a, L> { #[inline] fn set(&self) { self.inner.set(); self.sleep.tickle(usize::MAX); } }
The problem is that as soon as we
set()
, the waiting thread may actually proceed and destroy the latch. It may have already been awake (so it didn't need the tickle) or just happened to race awake for other reasons. The point is, we're not really representing lifetimes well in that&self
reference, in that it can actually die before the end of this function, and then readingself.sleep
is a use after "free" (on the stack).
This is definitely further evidence that this sort of pattern is pretty common "in the wild" and the kind of thing that folks are likely to write naturally. Note that all these cases embed the latch directly, although it is quite possible that the callers to latch.set()
are holding a Arc<Latch>
or &Latch
and might themselves get freed after the call to set()
as well.
@nikomatsakis just to be clear, the UB here did not come from dereferencable
, but from there being actually further accesses after deallocation (namely, self.sleep
)?
@RalfJung correct, the segfault was a direct result of reading self.sleep
after set()
cleared the way for dropping this latch. But even if we move that read earlier, there may still be UB in &self
remaining at all, like this issue. I think maybe the latch method should take *const Self
instead, but even then there's ultimately an atomic &self
call in there that will invalidate itself.
I think maybe the latch method should take
*const Self
instead, but even then there's ultimately an atomic&self
call in there that will invalidate itself.
Interesting. We could add a *const Self
-taking alternative for every method on atomics, but that would be a lot of duplication, and it would be a footgun (i.e. unsafe code authors would accidentally use the &self
version when they need the *const Self
version).
Instead, how about adding an attribute to opt out of deferenceable
– or from Stacked Borrows' perspective, 'protectors' – on specific function arguments? e.g.
fn fetch_add(#[unprotected] &self, val: i16, order: Ordering) -> i16 { ... }
It's not as if dereferenceable
provides any optimization benefit on a trivial method that wraps a single memory-access intrinsic. LLVM knows the memory will be dereferenced when the method, well, explicitly dereferences it, and there are no operations beforehand: nowhere to put a speculative read that's separated from the actual read. I suppose, assuming the atomic method is inlined (which of course it should be), the compiler could hypothetically (very hypothetically) take advantage of dereferenceable
to insert a speculative read immediately after the atomic operation, not for that operation's own sake but for the sake of some future access to the same memory location, later in the caller function. But the atomic instruction(s) would normally provide the value themselves without needing to add another load. Even if it somehow didn't, the compiler would usually be able to conclude that the object will stay alive based on the caller function's behavior.
Using an attribute would solve the duplication and the wrong-atomic-variant footgun. And the attribute could also be used by other code, like set()
in the above example. On the other hand, it would be easy for unsafe code to forget to use the attribute.
#[unprotected]
That's #[may_dangle]
, right? Or at the very least, it's closely related. The problem is that &self
may be dangling after the atomic operation but before the function returns (because the function itself is not atomic).
Are there any other primitives in the Rust standard library other than Atomic*
that could be used for synchronization leading to a free? I worry that this might end up a recurring bug for synchronized types, and a frighteningly scary one.
At the very least, I'd like to see an argument for if the same issue applies to the atomic op in Arc::drop_slow
/Weak::drop
:
// Arc::drop
fn drop(&mut self) {
if self.inner().strong.fetch_sub(1, Release) != 1 {
return;
}
atomic::fence(Acquire);
unsafe {
self.drop_slow();
}
}
// Arc::drop_slow
unsafe fn drop_slow(&mut self) {
ptr::drop_in_place(&mut self.ptr.as_mut().data);
if self.inner().weak.fetch_sub(1, Release) == 1 {
atomic::fence(Acquire);
Global.dealloc(self.ptr.cast(), Layout::for_value(self.ptr.as_ref()))
}
}
// Weak::drop
fn drop(&mut self) {
// If we find out that we were the last weak pointer, then its time to
// deallocate the data entirely. See the discussion in Arc::drop() about
// the memory orderings
//
// It's not necessary to check for the locked state here, because the
// weak count can only be locked if there was precisely one weak ref,
// meaning that drop could only subsequently run ON that remaining weak
// ref, which can only happen after the lock is released.
let inner: &ArcInner = if let Some(inner) = self.inner() { inner } else { return };
if inner.weak.fetch_sub(1, Release) == 1 {
atomic::fence(Acquire);
unsafe { Global.dealloc(self.ptr.cast(), Layout::for_value(self.ptr.as_ref())) }
}
}
I think both of these suffer from the same defect. Specifically, the danger zones are:
(All MIR generated on this playground, which copies the relevant implementation of Arc
and Weak
.)
Arc::drop
: after the intrinsics::atomic_xsub_rel
AtomicUsize::fetch_sub
, self: &AtomicUsize
Arc::drop
, temporary _: &ArcInner
if
Arc::drop_slow
: after the intrinsics::atomic_xsub_rel
AtomicUsize::fetch_sub
, self: &AtomicUsize
Arc::drop_slow
, temporary _: &ArcInner
if
NOTE: _21
is the Layout::for_ref
reference to ArcInner
, and based solely on the StorageDead
MIR instruction, lives until after the dealloc
, so is a reference to freed memory. #70686
Weak::drop
, after the intrinsics::atomic_xsub_rel
AtomicUsize::fetch_sub
, self: &AtomicUsize
Weak::drop
, inner: &ArcInner
NOTE: Weak::drop
will always have a dangling reference when dropping the ArcInner
!
Semantically, it seems like (barring a #[may_dangle]
-like solution) all of Arc::drop
, Arc::drop_slow
, and Weak::drop
need to have some extra synchronization signal that they're holding a &ArcInner
and not to deallocate it yet. The problem, of course, is that said synchronization signal is inside the ArcInner
.
#[unprotected]
That's
#[may_dangle]
, right? Or at the very least, it's closely related.
I hadn't thought of that. In a way, the two are opposites. #[may_dangle]
(an attribute on type or lifetime parameters of Drop
impl blocks) is a way for the programmer to tell the compiler: "I promise my code won't assume this lifetime is valid, so you can allow it to become dangling". My proposed #[unprotected]
, on the other hand, would tell the compiler: "You must promise not to assume this reference is valid, because my code might cause it to become dangling".
- inside
Arc::drop_slow
, temporary_: &ArcInner
So, if I understand Stacked Borrows correctly, it actually allows references in general to become dangling before they go out of scope. Only references that happen to be function arguments are special: they're given a "protector", which makes it insta-UB to invalidate the reference for the duration of the function execution. That makes the situation somewhat more tractable.
So, if I understand Stacked Borrows correctly, it actually allows references in general to become dangling before they go out of scope. Only references that happen to be function arguments are special: they're given a "protector", which makes it insta-UB to invalidate the reference for the duration of the function execution. That makes the situation somewhat more tractable.
Exactly. References generally only have to be "valid when used" -- though "used" is interpreted very coarsely and includes reborrows and (typed) copies, not just dereferences. The only exception are references passed as function arguments.
[unprotected]
Interesting, I would have made this a property of the type, not the individual functions. While it is true that the semantics for this work at the granularity of functions, I feel like annotating types is more tractable.
@RalfJung Could you elaborate on the tractability of annotations on types vs. functions? (Presumably by "on types" you mean where the type is defined, not where it is used.)
Well I just meant that you define the type just once and so won't forget to annotate every single function that needs this. We could even make UnsafeCell
have this effect to provide some backwards compatibility.
However in the lang team meeting about this we also identified that this is not sufficient to solve dereferencable
issues for other cases that do not involve interior mutability, such as Drain
.
I was debating about proposing the "opt-out" that @comex describes, but I'm not sure how I feel about it yet. It would definitely be easy to forget, we'd need some lints and extra checks I think to help in propagating those annotations.
You can certainly start to see the appeal of having a distinct type that is basically a "reference that may dangle on exit" -- but then, in some sense we have that type, it's called a raw pointer (ok, it's a bit more extreme), and the whole problem is that we have stable APIs that don't use raw pointers.
But also, there is one more point that I think is important: it's not great to have the only two options be "full safety, with all that implies" and "fully raw, with all that implies". It's really useful in practice to have intermediate levels of safety, even if they're more like "lints" that guide you into correct code than hard guarantees. I think this is why it feels a bit wrong to me to say that we should have raw pointers all throughout these APIs.
More and more, I do feel like we are going to want to special case UnsafeCell
in terms of saying "a reference to an UnsafeCell may dangle on exit", even if it's not a full solution, because it helps to address both backwards compatibility and what it is clearly a pretty common case.
I was thinking about this some more. The rayon bug that we encountered was a typical data-race sort of bug -- it took us quite some time to track down the crash, and the code "appears" quite innocuous. This makes me wonder. If we had had the stricter rules, in which &self
cannot be invalidated in the middle of the function's lifetime, and had had some way to dynamically test those rules, we would have gotten an error much earlier. This would then have required us to alter the type to *const Self
-- or perhaps even some other newtype'd wrapper like LiveOnEntry<Self>
-- which might then have raised awareness of the underlying problem. Food for thought.
So it seems like the most actionable thing we can do at this juncture is to create functions on the Atomic
operations that take raw pointers, and to encourage folks to use this if performing the atomic operation may have the side-effect of freeing the memory in which the atomic itself resides.
This implies, in turn, that the methods on the Arc
or whatever ought to be using raw pointers as well instead of &self
and friends.
This suggests to me that a good enhancement would be to pursue some way to have raw-pointer methods available. This would take a bit of design work and experimentation, but at least within e.g. rayon it would be an important usability enhancement, I believe.
(I guess all of this interacts with the question of whether we try to create a new type, like &unsafe
, that is the "blessed raw pointer" type.)
I'm curious what folks think of the immediately actionable step of creating raw pointer-based methods for atomics -- should we at least take this step and then use them to fix this particular instance of this more general issue?
That step is irreversible though, at least once we stabilize those methods.
An alternative that also fixes the UB here would be to stop adding "dereferencable" when UnsafeCell
is present. Of course that risks people relying on this so it might be hard to take back even if we say it's not part of the lang spec (and Arc
gets to rely on it solely because it is in libstd and we can adjust it with the lang spec).
@RalfJung the methods will exist, yes, but of course nobody is forced to use them (they could be deprecated).
I agree that not adding dereferenceable is also a reasonable and theoretically "withdraw-able" step, although I guess that in practice that reliance already exists anyhow (e.g., as I mentioned, I'm sure Rayon contains this same bug).
I think if we had to add new methods and can't somehow alter the existing ones, either through UnsafeCell detection or an attribute, then we should hold off until we have a final design - there's a lot of methods on Atomics and duplicating them forever seems like a big deal, especially because it's plausible we'll end up with 3 versions in the long run.
Turns out crossbeam
has a similar bug to Arc
(https://github.com/crossbeam-rs/crossbeam/issues/545), and it can even be triggered deterministically. Adding an exception for UnsafeCell
would have helped here as well; I am not sure how hard it would be for them to port everything to raw pointers (or a hypothetical &unsafe
).
Is this special-casing of function arguments something we're happy with / committed to? It feels like a footgun to me, and I think that's demonstrated by how much code already exists which has this issue.
Even if an attribute is added to relax this requirement, it will be such an obscure attribute that it will remain a footgun.
Are we even sure that there would be a measurable perf regression if we had to use the more relaxed form of dereferenceable
? Furthermore, given that we'll be emitting dereferenceable
on all reference parameters, LLVM should still have most of the information it needs, even with the more relaxed definition:
fn foo(arg: &T) {
bar(arg);
// We are worried that LLVM will no longer understand that `arg` is dereferencable here?
...
// However, `baz` will also have its argument marked as dereferenceable on entry to this function, so
// can't LLVM propagate that information backwards to figure out that `arg` was always dereferenceable?
baz(arg);
}
I don't have any data on perf regressions, but I do have data on affected optimizations: without this special treatment for function arguments, there are quite a few powerful optimizations that we cannot do. When considering moving a memory access through a reference around an unknown function call, there are 6 possible cases: reading/writing mutable references and reading shared references; and each of those can be either moved up or down across an unknown function call. 4 out of these 6 cases need "protectors" to be valid optimization in Stacked Borrows, which means they need this special treatment: all 3 cases of moving an access down need this, and likewise moving a write up also needs it.
So for example, this special treatment allows us to turn
fn foo(x: &mut i32) {
*x = 13;
bar(); // some function we know nothing about, except that it is nounwind
*x = 27;
}
into
fn foo(x: &mut i32) {
bar();
*x = 13; // and now this first write can easily be removed
*x = 27;
}
This optimization is simply not correct if we use something like "dereferencable only on entry". The counterexample is a version of bar
that colludes with the environment to get a pointer that aliases x
, uses it, and then goes into an infinite loop. Essentially what happens here is that x
's lifetime ends when that alias gets used, and since x
is never used again due to the infinite loop, it actually is okay for its lifetime to end.
This is an extremely powerful optimization, unlike anything I have seen in the literature: it calls an unknown function, bar
, with globally known memory (x
) being in a different state than in the source program! But we can show that bar
is unable to observe the contents stored in x
. (We actually have established this in Coq.)
(We could do this optimization even when bar
unwinds, but then we have to move *x = 13
down into both blocks that follow the call: the regular return block and the unwind block. This can still be worth it if it allows simplifications like the above in the return block and we don't care much about how expensive the unwind block is. But the resulting program, while easy to represent in MIR, is hard to write in surface Rust syntax.)
@RalfJung that makes sense: if I'm understanding correctly, this matters for function arguments and not local variables because for locals the compiler can determine whether they escape the function (and if not, the collusion you mentioned cannot happen), whereas for arguments that cannot be determined, and so dereferenceable must be part of the contract of the function?
Couldn't you have special treatment of a reference between its "last use" and when it is removed from the call-stack? After its last use it becomes a "zombie reference" which, by still existing in the stacked borrows "stack", disallows the collusion of bar
with the caller, but does not actually require that the reference still be dereferenceable.
edit: I opened an issue on the UCG repo to avoid derailing this thread further.
I managed to create an Arc::drop
test that miri will complain about:
use crossbeam_channel::bounded;
use std::sync::Arc;
fn main() {
let (sender, receiver) = bounded::<Arc<i32>>(1);
std::thread::spawn(move || while let Ok(_) = receiver.recv() {});
for i in 0.. {
dbg!(i);
let arc = Arc::new(42);
sender.send(Arc::clone(&arc)).unwrap();
}
}
I had to use crossbeam_channel
because miri kept telling me that mpsc
was deadlocked in futex_wait
. But the error here is not under crossbeam, so I hope that detail doesn't really matter.
$ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri run
[...]
[src/main.rs:10] i = 480
error: Undefined Behavior: deallocating while item is protected: [SharedReadWrite for <13233224> (call 4080208)]
--> /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:107:14
|
107 | unsafe { __rust_dealloc(ptr, layout.size(), layout.align()) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ deallocating while item is protected: [SharedReadWrite for <13233224> (call 4080208)]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
= note: inside `std::alloc::dealloc` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:107:14
= note: inside `<std::alloc::Global as std::alloc::Allocator>::deallocate` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:244:22
= note: inside `<std::sync::Weak<i32> as std::ops::Drop>::drop` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:2175:22
= note: inside `std::ptr::drop_in_place::<std::sync::Weak<i32>> - shim(Some(std::sync::Weak<i32>))` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
= note: inside `std::mem::drop::<std::sync::Weak<i32>>` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/mod.rs:974:24
= note: inside `std::sync::Arc::<i32>::drop_slow` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1109:9
= note: inside `<std::sync::Arc<i32> as std::ops::Drop>::drop` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/sync.rs:1702:13
= note: inside `std::ptr::drop_in_place::<std::sync::Arc<i32>> - shim(Some(std::sync::Arc<i32>))` at /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
note: inside `main` at src/main.rs:13:5
--> src/main.rs:13:5
|
13 | }
| ^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error
I'm not sure why disable-isolation matters here -- I had just copied that from invocations I used for rayon. I haven't reached the race without that (or haven't waited long enough), but with the flag I always get the error on i = 480
. :exploding_head:
$ rustc +nightly -Vv
rustc 1.63.0-nightly (420c970cb 2022-06-09)
binary: rustc
commit-hash: 420c970cb1edccbf60ff2aeb51ca01e2300b09ef
commit-date: 2022-06-09
host: x86_64-unknown-linux-gnu
release: 1.63.0-nightly
LLVM version: 14.0.5
My instinct is that the 480 is just you getting "lucky" as a result of the impact the turning off isolation has on Miri's initial state, and the right -Zmiri-seed
value might produce a much earlier protector error. I tried a few and didn't find anything that crashes faster.
I also ran this a few times without disabling isolation and it eventually deadlocks at iteration 3999. Now that doesn't make a whole lot of sense to me, but as of recently Miri emulates weak memory effects so I'm sure there are plenty of bugs out there that the weak memory emulation will find. (it could also be a Miri bug, but betting against the checker is usually a bad idea)
The deadlocks are strange, but the error @cuviper found is exactly what I would expect for this bug -- deallocation of an active protected shared reference.
If we weakened the requirements on deallocation so that they are equal to those on writing (which IMO makes a lot of sense), then this error would no longer arise. But that would of course mean we cannot tell LLVM that such references are dereferenceable for the entire duration of the function call.
I ran into this bug when trying to run the tests on the desync
crate under Miri. The crate uses several threads with mpsc
channels for communication, and some of the tests are repeated 1000 times to account for variation. This bug reliably occurs within a few hundred iterations, making it difficult to find actual data races in the test cases.
Here's an MRE I came up with while debugging this:
/*
Run with:
MIRIFLAGS='-Zmiri-seed=597' cargo +nightly-2022-06-10 miri run
or:
MIRIFLAGS='-Zmiri-seed=106 -Zmiri-disable-weak-memory-emulation' cargo +nightly-2022-06-10 miri run
*/
use std::{sync::Arc, thread};
fn main() {
let arc_1 = Arc::new(());
let arc_2 = arc_1.clone();
let thread = thread::spawn(|| drop(arc_2));
let mut i = 0;
while i < 256 {
i += 1;
}
drop(arc_1);
thread.join().unwrap();
}
Perhaps fetch_sub
should be changed to the underlying core::intrinsics::atomic_xsub_rel
until there is a consensus on how to permanently fix this? I would be happy to open a PR. It is a bit scary that such widely-used code could possibly be unsound; even if it seems to work today, I worry that someday an upstream change to LLVM could make this trigger real UB.
Perhaps fetch_sub should be changed to the underlying core::intrinsics::atomic_xsub_rel until there is a consensus on how to permanently fix this?
That won't help since fetch_sub
itself still takes a shared reference.
I think we should
dereferenceable
for shared references that might have UnsafeCell
(it looks like LLVM still has no way to represent "dereferenceable on function entry")Ah actually this would also lose the dereferenceable
attribute on mutable references -- they can be written to, and hence they can be deallocated during the function call. So we should talk to some of our LLVM folks to figure out if removing dereferenceable
for all mutable pointers is an option.
We could of course also say we do the change in Miri but leave LLVM codegen as-is, and argue that this is no worse than clang which also incorrectly emits dereferenceable
when it means "dereferenceable on entry"...
If you are affected by this in Miri, my current recommendation would be to use -Zmiri-preemption-rate=0
. That will get you back the previous behavior where Miri did not detect this problem.
I am doing a perf experiment at https://github.com/rust-lang/rust/pull/98017, and the LLVM implications are being discussed on Zulip.
Sorry if I was unclear, I meant replacing fetch_sub
with core::intrinsics::atomic_xsub_rel
in the body of Arc::drop
:)
We'd like to attempt a fix for this by adding an AtomicPtr
to the ArcInner
. It'd make Drop
significantly slower in the heavy contention case, but microbenchmarks aren't real. The main benefit is that it should work.
The LLVM folks were concerned that losing the dereferenceable
on &mut
might be too high a cost. So here is a potential alternative that avoids this side-effect: we say that &UnsafeCell
no longer get protectors. That means we allow any way of invalidating an &UnsafeCell
argument while a function runs, either by deallocating it or by using other pointers further down the stack.
In terms of what we say LLVM, I think this does exactly what we want: we have to remove dereferenceable
from &UnsafeCell
, but that's it. There are some other effects, some other code that used to have UB but is now allowed, but it seems extremely unlikely that that is a problem since we already basically accept that one cannot do any special optimizations around &UnsafeCell
(i.e., they optimize basically as bad as regular C pointers).
I have adjusted https://github.com/rust-lang/rust/pull/98017 to the proposed work-around above, and am proposing to land that PR.
To expand on our suggestion above: We don't know if it would work, but is it possible to do something akin to Once
, where you coordinate Drop
using stack-allocated stuff and AtomicPtr
? That way, you can guarantee no thread is inside AtomicUsize
or AtomicPtr
but instead all threads are talking to eachother across their call stacks. (Stack allocations are "free", at least compared to heap allocations.) As such there's no need to modify any of these things or add any workarounds.
Discovered by @Amanieu on IRLO. Quoting their report:
Arc::drop
contains this code:Once the current thread (Thread A) has decremented the reference count, Thread B could come in and free the
ArcInner
.The problem becomes apparent when you look at the implementation of
fetch_sub
:Note the point marked HERE: at this point we have released our claim to the
Arc
(as in, decremented the count), which means that Thread B might have freed theArcInner
. However the&self
still points to the strong reference count in theArcInner
-- so&self
dangles.Other instances of this: