Open RalfJung opened 1 year ago
I know that folks have wanted to use UnsafeCell with volatile stuff, and in some rare cases volatile memory isn't allowed to have spurious loads.
I don't think this should sway the decision, because I don't think people should be using UnsafeCell for volatile in the first place, but it's worth noting that the decision here could have an effect on volatile use (if the spurious read is allowed the using UnsafeCell wouldn't be a sound basis for abstraction when a volatile memory has side effects on read).
In fact even with angelic choice for the spurious read, the last version of internal
has UB if choice
is demonic choice -- at least under the usual interpretation of angelic and demonic: for each choice of "let's do a real read or a poison read", there exists a choice for the daemon to cause UB. (This follows exactly from a bad reordering of angelic and demonic choice.)
We require that
&UnsafeCell
are "dereferenceable on function entry" in the sense of pointing to allocated memory. This is believed to allow the introduction of spurious loads, if the compiler can prove that memory has not been deallocated yet. But that is far from obvious...Consider:
Under LLVM
noalias
and under Tree Borrows,public
is a sound function: ifchoice()
is true, we write toy
and nothing is even strange; ifchoice()
is false then the read fromx
meansy
can no longer be written, buty
is still valid for reads so the protector does not kick in. (In LLVM terms,noalias
is completely fine with arbitrary aliasing as long as all accesses are reads, and ifchoice()
is false then there are no writes.) SB is unhappy with this example, but SB is often too strict.Let's focus on
internal
, and let's imagine we introduce a spurious load fromx
at the beginning of the function. If any kind of spurious load is allowed, this one definitely is. Then we observe that in theelse
case,x
is being read twice, and let's say we know that the closure will not mutatex
(it cannot, sincex
is privately allocated inpublic
and its provenance is never leaked to outside code) -- let's say thechoice
function is marked "readonly". We arrive at:Interpreted as Rust code this has obvious UB if
choice()
returns true, but okay, maybe our IR has a different semantics. But what could those semantics be?choice()
is true, the read must somehow be considered "invalid", maybe we make it returnpoison
or so (similar to what would happen according to LLVM semantics if this read introduced a data race). Certainly the read must not observe the actual memory contents, because if it did it would not be reorderable around the aliasing write.choice()
is false, the read must not returnpoison
, as printing a poison value is UB!choice()
might well read from stdin and use that information to determine what it should returnIn other words, the semantics of the read must predict the future to be able to decide whether it should return poison and keep the aliasing state as-is, or return the actual data and mark the memory as "must not be mutated through other pointers". Equivalently, the read is making an angelic choice between these two options.
So... any semantics that wants to introduce spurious loads for
&UnsafeCell
must at least use angelic choice, and deal with the consequences of that (e.g. angelic choice cannot in general be freely reordered down across demonic choice). We should hold off in having MIR transformations introduce such spurious loads until that is clarified.The status quo is that we do not emit
dereferenceable
for these references, so LLVM shouldn't be introducing any spurious loads, so we are not at risk of being affected by this. Even if LLVM gains a "dereferenceable on entry" attribute, I think we shouldn't use it on!Freeze
(and!Unpin
) types.