On the legality of introducing spurious loads of `&UnsafeCell` (aka: dereferenceable and noalias don't interact well)

RalfJung commented 1 year ago

We require that &UnsafeCell are "dereferenceable on function entry" in the sense of pointing to allocated memory. This is believed to allow the introduction of spurious loads, if the compiler can prove that memory has not been deallocated yet. But that is far from obvious...

Consider:

use std::cell::UnsafeCell;

fn internal(x: &UnsafeCell<i32>, y: &mut i32, choice: impl FnOnce() -> bool) {
  if choice() {
    *y = 0;
  } else {
    let v = unsafe { x.get().read() };
    println!("{}", v);
  }
}

pub fn public(choice: impl FnOnce() -> bool) {
    let x = &UnsafeCell::new(0);
    let y = unsafe { &mut *x.get() };
    internal(x, y, choice);
}

Under LLVM noalias and under Tree Borrows, public is a sound function: if choice() is true, we write to y and nothing is even strange; if choice() is false then the read from x means y can no longer be written, but y is still valid for reads so the protector does not kick in. (In LLVM terms, noalias is completely fine with arbitrary aliasing as long as all accesses are reads, and if choice() is false then there are no writes.) SB is unhappy with this example, but SB is often too strict.

Let's focus on internal, and let's imagine we introduce a spurious load from x at the beginning of the function. If any kind of spurious load is allowed, this one definitely is. Then we observe that in the else case, x is being read twice, and let's say we know that the closure will not mutate x (it cannot, since x is privately allocated in public and its provenance is never leaked to outside code) -- let's say the choice function is marked "readonly". We arrive at:

fn internal(x: &UnsafeCell<i32>, y: &mut i32, choice: impl FnOnce() -> bool) {
  let v = unsafe { x.get().read() };
  if choice() {
    *y = 0;
  } else {
    println!("{}", v);
  }
}

Interpreted as Rust code this has obvious UB if choice() returns true, but okay, maybe our IR has a different semantics. But what could those semantics be?

If choice() is true, the read must somehow be considered "invalid", maybe we make it return poison or so (similar to what would happen according to LLVM semantics if this read introduced a data race). Certainly the read must not observe the actual memory contents, because if it did it would not be reorderable around the aliasing write.
If choice() is false, the read must not return poison, as printing a poison value is UB!
choice() might well read from stdin and use that information to determine what it should return

In other words, the semantics of the read must predict the future to be able to decide whether it should return poison and keep the aliasing state as-is, or return the actual data and mark the memory as "must not be mutated through other pointers". Equivalently, the read is making an angelic choice between these two options.

So... any semantics that wants to introduce spurious loads for &UnsafeCell must at least use angelic choice, and deal with the consequences of that (e.g. angelic choice cannot in general be freely reordered down across demonic choice). We should hold off in having MIR transformations introduce such spurious loads until that is clarified.

The status quo is that we do not emit dereferenceable for these references, so LLVM shouldn't be introducing any spurious loads, so we are not at risk of being affected by this. Even if LLVM gains a "dereferenceable on entry" attribute, I think we shouldn't use it on !Freeze (and !Unpin) types.

Lokathor commented 1 year ago

I know that folks have wanted to use UnsafeCell with volatile stuff, and in some rare cases volatile memory isn't allowed to have spurious loads.

I don't think this should sway the decision, because I don't think people should be using UnsafeCell for volatile in the first place, but it's worth noting that the decision here could have an effect on volatile use (if the spurious read is allowed the using UnsafeCell wouldn't be a sound basis for abstraction when a volatile memory has side effects on read).

RalfJung commented 1 year ago

In fact even with angelic choice for the spurious read, the last version of internal has UB if choice is demonic choice -- at least under the usual interpretation of angelic and demonic: for each choice of "let's do a real read or a poison read", there exists a choice for the daemon to cause UB. (This follows exactly from a bad reordering of angelic and demonic choice.)

rust-lang / unsafe-code-guidelines

On the legality of introducing spurious loads of `&UnsafeCell` (aka: dereferenceable and noalias don't interact well) #435