rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
653 stars 57 forks source link

What exactly is going on with return places and the LHS of assignments? #417

Open RalfJung opened 1 year ago

RalfJung commented 1 year ago

This question is somewhat tied up in at least

but those issues also talk about move a lot which I think is mostly orthogonal (and tracked in https://github.com/rust-lang/unsafe-code-guidelines/issues/416).

For function calls ret = call(args), in Miri we currently do something special: The return place designated by the caller is retagged with a fresh unique (and protected) tag upon function entry. A fresh place is allocated for the callee and used for the _0 MIR local during function execution. When the function is done, the value is copied from _0 to the caller-provided return place. The retag means that any access to the caller-provided return place is UB for the duration of the call. This is useful to explain compilation strategies where actually no fresh memory is allocated for the callee, and _0 directly points to the caller-provided return place.

For MIR assignments, a similar semantics is possible, and is intended to explain uses of memcpy that require the LHS and RHS to not overlap. (Miri currently does explain this in a somewhat subtle way, where the "fallback" path of copying non-Scalar/ScalarPair values uses a mem_copy that raises UB on overlaps.)

However, for function calls with custom MIR these semantics are actually insufficient: as usual with aliasing model tricks, this fails to account for pointer comparisons! If someone were to compare the address of the return place in the callee with the caller-provided return place, they would always come out inequal in the model but could be equal in real codegen.

We should at least come up with a spec explaining today's codegen (and implement it in Miri). I am not sure if a tighter spec is required for planned future changes.

So do we need something completely different? For function calls, if the return place is just a local, one could imagine something more extreme where that local is actually deallocated first, then a callee return place is allocated, then the function runs, then we load the return value, deallocate the callee return place, allocate again the caller return place, and store the return value in there. This is very symmetric with some of the move semantics proposals. But it only works when the caller return place can be deallocated and it means all pointers to the caller return place become invalid, so that seems too extreme...

Another rather crude option would be to just pick non-deterministically whether the callee gets a fresh return place or a direct pointer to the caller-provided return place. This would be fine to explain what codegen does. We have to be careful with optimizations exploiting this though; once an optimization assumed that one choice or the other was made, we have to ensure codegen is consistent with that! We probably need the aliasing tricks (to explain that in the callee, the return place is treated like a noalias argument) and this kind of non-determinism?

For assignments, I don't think the pointer equality concern applies, so the aliasing might be enough?

bjorn3 commented 1 year ago

We have to be careful with optimizations exploiting this though; once an optimization assumed that one choice or the other was made, we have to ensure codegen is consistent with that!

I think optimizations should not be able to pick either option instead only codegen itself should be able to pick it and optimizations on MIR should consider both options to be possible while optimizations on LLVM IR can know which one was chosen as at that point it is fixed.

RalfJung commented 1 year ago

If we don't have a way to encode in MIR the result of making a choice, then this is effectively what happens. Basically from the perspective of MIR optimizations, there is some external oracle that defines which option is used, and the optimization has to be correct for every possible oracle. (Formally speaking optimizations have a little more freedom than that but I don't think that matters.)

comex commented 1 year ago

How would Miri handle this kind of nondeterministic choice?

RalfJung commented 1 year ago

Just like all other non-deterministic choice -- it would use its pseudo-RNG to pick a bool "at random" (controlled by the initial seed).

quaternic commented 1 year ago

What is the problem with the caller always providing the callee with exclusive access to some return place, with uninitialized contents on entry? And if that were the case, why wouldn't the return local just always be the caller-provided return place? (By return place, I mean an ABI-dependent choice of either a register or some place expression using the arguments that are passed.)

RalfJung commented 1 year ago

We need to represent the fact that if the caller has taken the address of the return place, and the callee takes the address of the return place, and those are compared, then both "equal" and "not equal" are valid results.

Under your semantics, the comparison always comes out as true. Under Miri's semantics, it always comes out as false. Neither of these reflect reality.

quaternic commented 1 year ago

Okay, that makes sense, but it's not obvious to me why "not equal" should be a valid result. If they both observe the address of the return place, then surely it must be equal. How else is the caller going to find the return value from the callee? I can see that if the return place is a register, it doesn't have a memory address they could be comparing, but even then they must agree on which register.

If it is just to explain current codegen where "not equal" is a possibility, then I could only assume that at least one of the two functions did not actually observe the address of the return place, but of some other location.

bjorn3 commented 1 year ago

If the return value is passed in a register, caller and callee can't see the same address for the return place as the callee has to allocate a temporary stack slot for the return value (unless the address is never taken, then it could stay in registers all the way) and the callee never gets the address of the place the caller will store the return value.

quaternic commented 1 year ago

My point is that if the value is passed in a register, neither can observe any address for the return place, because it doesn't have one: It's a register. (Unless... you do give registers some magic addresses. Then they would once again observe the same address.)

Of course that hinges on what is meant by "return place". I've been using "where is the return value when control is passed back to the caller", but yours (probably the standard definition) seems to be "the place where the caller will store the return value".

bjorn3 commented 1 year ago

The return place on the callee side is the _0 local. If the return value is passed in memory, thid aliases the place where the caller wants the return value to be. But if the return value is stored in a registerx _0 is a temporary stored on the callee's stack frame and implicitly loaded by the Return terminator. In this case the address of the return place on the caller and callee side doesn't and cannot match.

RalfJung commented 1 year ago

My point is that if the value is passed in a register, neither can observe any address for the return place,

You are mixing different levels of abstraction here, making this statement meaningless. We are talking about the Rust Abstract Machine. It does not have "registers".

The question is, what is this pseudo-code allowed to print?

fn compare(addr1: *const i32, addr2: *const i32) { println!("{}", addr1 == addr2); }

fn callee(addr1: *const i32) -> i32 { mir! {
  {
    let addr2 = addr_of!(RET);
    Call(_ignore, compare(addr1, addr2), bb2)
  }
  bb2 = { Return }
}}

fn main() { mir! {
  {
    let x = 0i32;
    let addr1 = addr_of!(x);
    Call(x, callee(addr1), bb2)
  }
  bb2 = { Return }
}}

Miri currently will only ever print false here. Your proposal will only ever print true. In reality this depends on how exactly the ABI handles this particular return type and we probably don't want to prescribe either way in the spec, so this should be non-deterministically true or false.

quaternic commented 1 year ago

After bjorn3's last reply, I did realize that the key issue which the earlier thread is about is that at the stage where MIR is looked at, it is not known if the codegen will later pass in register or by out-ref. Or more generally, the ABI of functions is non-deterministic in the AM.

The question is, what is this pseudo-code allowed to print?

So yes, I agree that given a non-deterministic ABI, that example could print either true or false.