Semantics of MIR function calls

RalfJung commented 4 years ago

Some discussion at https://github.com/rust-lang/rust/pull/71005#issuecomment-612242397 revealed that we are not entirely sure what exactly the semantics of passing arguments and return values for function calls should be.

The simplest possible semantics is to say that when a stack frame is created, we allocate fresh memory for all arguments and return values (according to the known layout, determined by the callee). We copy the function arguments into the argument slots. Then we evaluate the function, and when it returns, we copy the return value back.

However, such a model is hard to compile down to destination-passing style, where the callee actually writes its return value directly into caller-provided memory. If that aliases with other things the function can access, behavior could differ with and without destination-passing style. This is complicated by the fact that in MIR right now a Call does not provide a return place, but even with destination-passing style diverging functions (without a return place) may access their return local _0 . Moreover @eddyb says that also for some function arguments, we might want to elide the copy during codegen; it is unclear whether that is behaviorally equivalent to the above copying semantics or not.

This is something of a sibling to https://github.com/rust-lang/rust/issues/68364. We should have a good way to collect all these "MIR semantics" issues...

RalfJung commented 4 years ago

I think a first step we should take is to make the Call terminator always provide a return place. Right now, every backend has to replicate the same hack where some scratch memory still needs to be allocated for the return place in case the caller did not provide some (except for Miri which makes it illegal to access the return place in this situation, but that is likely just wrong).

Beyond that, I am not sure. Miri right now implements copying as described above for arguments. For return values, it directly uses the caller-provided place, which means RETURN_PLACE needs to be special-cased in a bunch of places. We can probably get rid of this special treatment if we are okay with losing the "immediate value" optimization for return places; then we could force_allocate the caller-provided return place and make the callee _0 an Indirect local. (This would entirely remove return_place from Frame, which is good I think.)

To ensure that the return place does not alias with anything, we could try using Stacked Borrows: https://github.com/rust-lang/miri/pull/1330. However, hat is quite the hack -- usually retags are explicitly in the code; this would make the return place the only implicit retag in our semantics. Also we should at least go over a bunch of tricky examples to ensure that this indeed makes all bad cases UB. Unfortunately, without a solution to https://github.com/rust-lang/miri/issues/196, it is hard to test these things.

For passing arguments without a copy, I don't know if what Miri does is a problem and I don't know what a solution could look like.

eddyb commented 4 years ago

cc @rust-lang/wg-mir-opt @nikomatsakis

nikomatsakis commented 4 years ago

@RalfJung is the optional place only employed for functions that return uninhabited values? The type ! has size 0, but I suppose in some cases the type might be non-zero in size..? I'm a bit confused about that part of what you wrote.

bjorn3 commented 4 years ago

(!, u8) has a size of 1.

eddyb commented 4 years ago

Because of partial initialization, you could have fields of e.g. (A, B, C, !) written to.

nikomatsakis commented 4 years ago

Right. I just wanted to be sure that this is the kind of "dummy place" that @RalfJung was referring to, or if this was also a problem for the ! type.

nikomatsakis commented 4 years ago

I think that in general when we are assigning to a place, that place should not alias any of the values being read during the instruction. In other words, I think we should avoid the need for backends to introduce "temporaries".

I'm not 100% sure what this implies for arguments. I forget if we permit the arguments in MIR to be mutable, or do we have a function with a mut parameter copy those values into a local copy?

This is all related to #68304, since in there we are talking about cases where the size of the parameter is not known at runtime, and we would like to be able to pass it as argument by reference without having to create a temporary (which would require an alloca). Presumably this is at least partly what @eddyb was referring to.

eddyb commented 4 years ago

@nikomatsakis For optimization reasons we want calls to not do copies of anything we pass by reference in the ABI. Otherwise MIR optimizations could never remove those copies, even when it would be correct to do so.

nikomatsakis commented 4 years ago

Right, that'd be the other case. Still, it looks if I compile

fn foo(mut x: u32) {
    x += 1;
}

I get this:

fn  foo(_1: u32) -> () {
    debug x => _1;                       // in scope 0 at src/main.rs:1:8: 1:13
    let mut _0: ();                      // return place in scope 0 at src/main.rs:1:20: 1:20
    let mut _2: (u32, bool);             // in scope 0 at src/main.rs:2:5: 2:11

    bb0: {
        _2 = CheckedAdd(_1, const 1u32); // bb0[0]: scope 0 at src/main.rs:2:5: 2:11
        assert(!move (_2.1: bool), "attempt to add with overflow") -> bb1; // bb0[1]: scope 0 at src/main.rs:2:5: 2:11
    }

    bb1: {
        _1 = move (_2.0: u32);           // bb1[0]: scope 0 at src/main.rs:2:5: 2:11
        return;                          // bb1[1]: scope 0 at src/main.rs:3:2: 3:2
    }
}

Note in particular the _1 = move _2 at the end I think that if parameters were (at least sometimes) references into the caller's stack frame, that could be problematic, right? (In other words, we don't want the callee to be mutating the caller's variables.)

eddyb commented 4 years ago

(In other words, we don't want the callee to be mutating the caller's variables.)

We do, again, for optimizations reasons. This only happens with Operand::Move arguments, Operand::Copy arguments will do a copy in the caller before the call IIRC.

hanna-kruppe commented 4 years ago

To clarify, do you mean that e.g. in

fn foo(s: String) {
    bar(s);
}

the call to bar should pass on the same address foo received as argument? This is not currently the case, but IIUC it falls out of the aspiration / codegen strategy that you describe.

Edit: to be clear, the reason it currently copies the String in foo is an explicit temporary in the MIR, whose use as Operand::Move in the call to bar the indeed happens without a further temporary that would have been implicit in the MIR. But presumably you'd want that temporary to be removed too?

RalfJung commented 4 years ago

I think that in general when we are assigning to a place, that place should not alias any of the values being read during the instruction. In other words, I think we should avoid the need for backends to introduce "temporaries".

This issue is not about assignments though but about function calls. Or do you view _1 = foo(...) as an assignment? (I don't, it's not StatementKind::Assign.)

is the optional place only employed for functions that return uninhabited values? The type ! has size 0, but I suppose in some cases the type might be non-zero in size..? I'm a bit confused about that part of what you wrote.

I don't know if you are asking about current behavior or proposed change. Right now, it seems like indeed diverging functions do get a return place and a return destination (basic block) if their return type has non-zero size (demo). That's really strange, why would whether or not the function gets a return destination have anything to do with the size of its return type? I think this is basically an accident because in the MIR we have an Option<(Place, BasicBlock)> so we can only have both or neither. (Note that the "if Some, the call is converging" comment there is wrong as my example demonstrates.)

Worse, the callee cannot even rely on this because the callee might be generic and hence, potentially, have a non-ZST uninhabited return type, but the caller knows its ZST and then fails to provide a return place. This has convinced me that diverging functions should always have a return place. IOW, the MIR Call terminator should have Option<BasicBlock>, that's fine (and then returning from a function is just UB if no return destination block was provided, easy), but it should have a mandatory Place.

nikomatsakis commented 4 years ago

@RalfJung I was including calls under the category of assignment, yes, but I am happy to limit the term "assignment" to StmtAssign, though I think it'd be useful to have a blanker term for "something that mutates a destination place".

Regarding the question of whether diverging calls should always have a "destination place", I think that's fine, simpler and more uniform MIR seems better.

@eddyb

We do, again, for optimizations reasons. This only happens with Operand::Move arguments, Operand::Copy arguments will do a copy in the caller before the call IIRC.

Interesting. I guess that limiting this to Move arguments makes sense, in that the caller must regard that memory as deinitialized after the call (so it does no harm for the callee to have mutated it).

RalfJung commented 4 years ago

Interesting. I guess that limiting this to Move arguments makes sense, in that the caller must regard that memory as deinitialized after the call (so it does no harm for the callee to have mutated it).

There's still something that makes no sense, and that is that an Operand would be used "by reference". It was my understanding that the entire point of Operand is to be a value, i.e., something you can copy out of, but not something that has an address in memory.

@eddyb

For optimization reasons we want calls to not do copies of anything we pass by reference in the ABI. Otherwise MIR optimizations could never remove those copies, even when it would be correct to do so.

For reasons of basic sanity we want MIR semantics to be as simple as we can make it. So the question is, with MIR semantics as implemented by Miri (function arguments always copy), what is the concrete MIR optimization or lowering that does not observably preserve this behavior?

eddyb commented 4 years ago

Regarding the question of whether diverging calls should always have a "destination place", I think that's fine, simpler and more uniform MIR seems better.

I think we all agree on this now, who's gonna do it? Do we want an issue just for that?

There's still something that makes no sense, and that is that an Operand would be used "by reference". It was my understanding that the entire point of Operand is to be a value, i.e., something you can copy out of, but not something that has an address in memory.

I keep bringing this up, but: I don't like it either. I have previously called it an abuse of Operand, including for the RHS of an assignment where (non-overlapping) memcpy may be used (a "by-reference" use with aliasing implications).

So the question is, with MIR semantics as implemented by Miri (function arguments always copy), what is the concrete MIR optimization or lowering that does not observably preserve this behavior?

It's fine, you're just not detecting the UB necessary to observe the distinction (i.e. you get a difference between codegen and miri without miri detecting it as UB).

This is just like with the discussion around assignments and aliasing between destination and source. Even if miri wouldn't detect that, codegen could still result in something that doesn't crash but corrupts the values silently.

nikomatsakis commented 4 years ago

@RalfJung

There's still something that makes no sense, and that is that an Operand would be used "by reference". It was my understanding that the entire point of Operand is to be a value, i.e., something you can copy out of, but not something that has an address in memory.

Yes, that was bugging me too! It seems like what we're saying is that calls should take places, not operands.

RalfJung commented 4 years ago

I think we all agree on this now, who's gonna do it? Do we want an issue just for that?

Sure, why not? This seems like MIR building might need some non-trivial adjustments.

I keep bringing this up, but: I don't like it either. I have previously called it an abuse of Operand, including for the RHS of an assignment where (non-overlapping) memcpy may be used (a "by-reference" use with aliasing implications).

Hm, I am somehow not seeing this as abuse for assignment... but this goes back to the discussion we had in the assignment issue.

It's fine, you're just not detecting the UB necessary to observe the distinction (i.e. you get a difference between codegen and miri without miri detecting it as UB).

This is just like with the discussion around assignments and aliasing between destination and source. Even if miri wouldn't detect that, codegen could still result in something that doesn't crash but corrupts the values silently.

That doesn't sound fine at all! When Miri and codegen differ like that, that is a critical soundness bug in either Miri or codegen.

The latest proposal by @jonas-schievink in https://github.com/rust-lang/rust/pull/71005 is also interesting. Together with https://github.com/rust-lang/miri/pull/1330, this is a mixture of the two alternatives I described above: we always allocate backing store for the _0 local and use that during execution of the function. Moreover, when the function returns, if a return place is given, we copy from _0 to there (and at that point perform validation of the return value).

What I like about this is that it avoids having to manually call validation on stack frame pop like we do so far, and like we would have to even if we make _0 directly point to the caller-provided return place. At the same time, retagging the return place (while still something of a hack) should ensure that eliding the copy during codegen is correct. This almost seems like the best of both worlds to me (and at least for Miri it makes making the return place mandatory much less pressing, as we allocate for it manually anyway).

nikomatsakis commented 4 years ago

@RalfJung maybe I'm confused -- it seems like if the official semantics are that we have allocated backing storage for _0, we would have problems with unsized return values... oh, I see, of course _0 must always be sized, so it's always possible in principle to allocate that temporary up front.

I guess the key point has to do with retagging the return place -- can you elaborate a bit on why that ensures eliding the copy is correct? (bit out of cache for me I suppose)

RalfJung commented 4 years ago

it seems like if the official semantics are that we have allocated backing storage for _0, we would have problems with unsized return values... oh, I see, of course _0 must always be sized, so it's always possible in principle to allocate that temporary up front.

Honestly I did not give much thought to unsized return values. How are they supposed to work anyway with a caller-allocated return place?

I think the existing handling of lazily allocating unsized locals (i.e., allocating them on the first write to them, when the size is known) should in principle also work for return values. It's something of a hack, but it works.

I guess the key point has to do with retagging the return place -- can you elaborate a bit on why that ensures eliding the copy is correct? (bit out of cache for me I suppose)

It's about retagging with a protector. The reasoning is that, after the retag-with-protector, we have a fresh tag that is not known to anyone except this return place, and that tag is at the top of the borrow stack protected by the current function call. This means that if any other tag is used to access this memory while the function is ongoing (i.e., until it gets popped), there is immediate UB as a protected item would get popped off the stack. (If the callee does &mut _0, that reference may indeed be used to access the return place as it is derived from the right tag. But that is expected, I presume.)

This is basically the same reasoning that permits the following optimization (assuming no unwinding happens):

fn foo(x: &mut i32) {
    // We are allowed to move the write down below the call, because if `bar`
    // reads or writes `x`, it causes immediate UB.
    // Without protectors this transformation would be illegal as `bar` could just pop `x`'s
    // tag off the stack without any bad consequences, since `x` does not get used again.
    *x = 13;
    bar();
}

nikomatsakis commented 4 years ago

@RalfJung Thanks for the explanation, that makes sense.

Regarding unsized return values, it's very much not clear how they should work (who allocates the memory?), so I wouldn't worry about that. I think that if we do ever try to implement such a scheme, we can wrestle with it then.

bjorn3 commented 3 years ago

fn foo(a: Vec<u8>) {}
fn bar(a: Vec<u8>) {
    foo(a);
}

currently results in

// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn bar(_1: Vec<u8>) -> () {
    debug a => _1;                       // in scope 0 at <anon>:1:30: 1:31
    let mut _0: ();                      // return place in scope 0 at <anon>:1:42: 1:42
    let _2: ();                          // in scope 0 at <anon>:1:44: 1:50
    let mut _3: std::vec::Vec<u8>;       // in scope 0 at <anon>:1:48: 1:49

    bb0: {
        _3 = move _1;                    // scope 0 at <anon>:1:48: 1:49
        _2 = foo(move _3) -> bb1;        // scope 0 at <anon>:1:44: 1:50
                                         // mir::Constant
                                         // + span: <anon>:1:44: 1:47
                                         // + literal: Const { ty: fn(std::vec::Vec<u8>) {foo}, val: Value(Scalar(<ZST>)) }
    }

    bb1: {
        _0 = const ();                   // scope 0 at <anon>:1:42: 1:53
        return;                          // scope 0 at <anon>:1:53: 1:53
    }
}

fn foo(_1: Vec<u8>) -> () {
    debug a => _1;                       // in scope 0 at <anon>:1:8: 1:9
    let mut _0: ();                      // return place in scope 0 at <anon>:1:20: 1:20

    bb0: {
        _0 = const ();                   // scope 0 at <anon>:1:20: 1:22
        drop(_1) -> bb1;                 // scope 0 at <anon>:1:21: 1:22
    }

    bb1: {
        return;                          // scope 0 at <anon>:1:22: 1:22
    }
}

Even when conservatively considering any aliasing between any call arguments or a call argument and the return place UB at the caller side, it would be fine to omit the _3 = move _1; and instead pass move _1 as argument. This is true even when there is a reference to _1 as said reference is invalidated by the _3 = move _1 cq _2 = foo(move _1) -> bb1. Because it is a move and not a copy, it is fine for the callee to directly mutate the stack slot of the caller's copy of the local like codegen can do right now. I think it would be fine to omit this extra copy to a temp var before borrowck while building mir, where it would be very cheap. This should improve both compilation time and debug mode runtime.

RalfJung commented 3 years ago

Even when conservatively considering any aliasing between any call arguments or a call argument and the return place UB at the caller side,

More UB means more compiler freedom, so from a compiler perspective the conservative approach would be to do more copies... right?

Because it is a move and not a copy

So far, we don't have a clear idea for how moves and copies would be any different operationally (and, thus, in terms of UB).

bjorn3 commented 3 years ago

More UB means more compiler freedom, so from a compiler perspective the conservative approach would be to do more copies... right?

The conservative choice here is assuming that the callee modifies the locals used to store arguments and thus that accessing the locals used for arguments after the call is UB. Even when under this conservative choice, it should be fine to omit the copy as the local isn't used anymore before being reinitialized.

So far, we don't have a clear idea for how moves and copies would be any different operationally (and, thus, in terms of UB).

Why would it be fine to read a local after it has been moved out of? The surface rust language doesn't even allow this and I think borrowck enforces this at the MIR level. Writing (aka reinitializing) is fine though.

RalfJung commented 3 years ago

The conservative choice here is assuming that the callee modifies the locals used to store arguments and thus that accessing the locals used for arguments after the call is UB.

Oh that's what you mean by "UB on the caller side", I see. Makes sense.

Why would it be fine to read a local after it has been moved out of? The surface rust language doesn't even allow this and I think borrowck enforces this at the MIR level. Writing (aka reinitializing) is fine though.

The question is more, what exactly is the thing happening on the Abstract Machine that makes future reads UB (but writes not)? The most promising proposal was that a move would replace the contents of the source memory by Uninit, basically "de-initializing" that memory. That would however not make all reads UB, it just means reads encounter uninit memory. Also old pointers pointing to that memory could remain valid and could still be used even after the memory was re-initialized -- it was not clear whether that is a problem or not. And finally, making a seemingly read-only operation be mutating like this could have plenty of unforeseen side-effects (and it messes with how Miri uses Rust types to make sure that read-only operations are read-only).

The thread to discuss move-related UB is https://github.com/rust-lang/unsafe-code-guidelines/issues/188.

RalfJung commented 3 years ago

@bjorn3 so what would be an example of a MIR program that goes wrong when a copy is replaced by a move in a function call? What you say sounds like that would not be a correct transformation to make in general, and yet, according to the MIR spec implicit in Miri, this would be correct. Having such an example might help to figure out where this mismatch comes from.

bjorn3 commented 3 years ago

struct LargeStruct([u8; 1024]);

fn caller() {
    let _1: LargeStruct;
    let _2: bool;

    bb0: {
        _1.0 = [0;1024];
        call callee(copy _1) -> bb1;
    }

    bb1: {
        _2 = _1.0[0] == 0;
        assert(move _2) -> bb2;
    }

    bb2: {
        return;
    }
}

fn callee(_1: LargeStruct) {
    bb0: {
        _1.0[0] = 42;
        return;
    }
}

would be codegened to something like:

define @caller() {
entry:
    %0 = alloca LargeStruct
    memset(%0, 0, 1024) ; _1.0 = [0;1024];

    ; call callee(copy _1) -> bb1;
    %1 = alloca LargeStruct
    memcpy(%0, %1, 1024)
    call @callee(%1)

    ; ...
}

define @callee(LargeStruct *%0) {
entry:
    %1 = gep %0, 0
    store %1, 42
    ret
}

where a pointer to a copy of _1 is passed to callee, while if it were a move instead of copy, you would get:

define @caller() {
entry:
    %0 = alloca LargeStruct
    memset(%0, 0, 1024) ; _1.0 = [0;1024];

    ; call callee(copy _1) -> bb1;
    call @callee(%0)

    ; ...
}

define @callee(LargeStruct *%0) {
entry:
    %1 = gep %0, 0
    store %1, 42
    ret
}

where a pointer to _1 is directly passed to callee and written to causing _1 of caller to get changed and the assertion to fail.

tmiasko commented 3 years ago

The old copy propagation didn't take into account the currently existing distinction between copy and move operands and misoptimized following program:

fn main() {
    let a = [0; 128];
    loop {
        f(a);
    }
}

fn f(mut a: [u8; 128]) {
    assert_eq!(a[0], 0);
    a[0] += 1;
}

rustc +nightly-2020-09-14 a.rs -Zmir-opt-level=2 && ./a
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', a.rs:9:5

--- a.main.005-007.CopyPropagation.before.mir   2021-06-15 10:42:21.821538556 +0200
+++ a.main.005-007.CopyPropagation.after.mir    2021-06-15 10:42:21.821538556 +0200
@@ -1,37 +1,37 @@
-// MIR for `main` before CopyPropagation
+// MIR for `main` after CopyPropagation

 fn main() -> () {
     let mut _0: ();                      // return place in scope 0 at a.rs:1:11: 1:11
     let _1: [u8; 128];                   // in scope 0 at a.rs:2:9: 2:10
     let mut _2: !;                       // in scope 0 at a.rs:3:5: 5:6
     let mut _3: ();                      // in scope 0 at a.rs:1:1: 6:2
     let _4: ();                          // in scope 0 at a.rs:4:9: 4:13
     let mut _5: [u8; 128];               // in scope 0 at a.rs:4:11: 4:12
     scope 1 {
         debug a => _1;                   // in scope 1 at a.rs:2:9: 2:10
     }

     bb0: {
-        StorageLive(_1);                 // scope 0 at a.rs:2:9: 2:10
+        nop;                             // scope 0 at a.rs:2:9: 2:10
         _1 = [const 0_u8; 128];          // scope 0 at a.rs:2:13: 2:21
         StorageLive(_2);                 // scope 1 at a.rs:3:5: 5:6
         goto -> bb1;                     // scope 1 at a.rs:3:5: 5:6
     }

     bb1: {
         StorageLive(_4);                 // scope 1 at a.rs:4:9: 4:13
-        StorageLive(_5);                 // scope 1 at a.rs:4:11: 4:12
-        _5 = _1;                         // scope 1 at a.rs:4:11: 4:12
-        _4 = f(move _5) -> bb2;          // scope 1 at a.rs:4:9: 4:13
+        nop;                             // scope 1 at a.rs:4:11: 4:12
+        nop;                             // scope 1 at a.rs:4:11: 4:12
+        _4 = f(move _1) -> bb2;          // scope 1 at a.rs:4:9: 4:13
                                          // mir::Constant
                                          // + span: a.rs:4:9: 4:10
                                          // + literal: Const { ty: fn([u8; 128]) {f}, val: Value(Scalar(<ZST>)) }
     }

     bb2: {
-        StorageDead(_5);                 // scope 1 at a.rs:4:12: 4:13
+        nop;                             // scope 1 at a.rs:4:12: 4:13
         StorageDead(_4);                 // scope 1 at a.rs:4:13: 4:14
         _3 = const ();                   // scope 1 at a.rs:3:10: 5:6
         goto -> bb1;                     // scope 1 at a.rs:3:5: 5:6
     }
 }

RalfJung commented 3 years ago

I see, interesting.

So basically, when call callee(move _1) returns, I may not make any assumptions about the memory stored in _1. That fits to the proposal of "move should de-init the memory it moves from". Is it sufficient though? I don't think so.

What happens in the abstract machine is that either way, a copy is made. move is just copy-followed-by-deinit (in the version of the Abstract Machine we are considering here). So what if a raw pointer exists to this memory and while callee is ongoing, someone writes to that raw pointer?

struct LargeStruct([u8; 1024]);

fn caller() {
    let _1: LargeStruct;
    let _2: bool;
    let _3: *mut LargeStruct;

    bb0: {
        _1.0 = [0;1024];
        _3 = &mut _1 as *mut _;
        call callee(move _1, copy _3) -> bb1;
    }

    bb1: {
        _2 = _1.0[0] == 1;
        assert(move _2) -> bb2;
    }

    bb2: {
        return;
    }
}

fn callee(_1: LargeStruct, _2: *mut LargeStruct) {
    bb0: {
        (*_2).0[0] = 1;
        _1.0[0] = 42;
        return;
    }
}

In the abstract machine, even with "move de-inits memory", this assertion would hold. Under the code general you describe, however, the assertion would fail.

So if we want to continue using that code generation, we need to find some justification for why the above program has UB. The only avenue I can see is some kind of aliasing rule: _1 mus not be written to while the call is ongoing, or so.

RalfJung commented 3 years ago

I guess one thing that would work is to say that move _1 also counts as a write from the perspective of the aliasing model. For Stacked Borrows, that will then mean that all existing pointers to this local become invalid.

Then the one issue that remains, I think, is that this optimization can have effects on pointer equality: in the Abstract Machine, _1 inside caller and _1 inside callee are two distinct allocations. Comparing their addresses thus is guaranteed to come back false. However, if _1 in callee secretly just references the data in caller, the pointers will compare equal in the real compiled code. This is an incorrect compilation, and I do not have good ideas for fixing that.

JakobDegen commented 2 years ago

Then the one issue that remains, I think, is that this optimization can have effects on pointer equality: in the Abstract Machine, _1 inside caller and _1 inside callee are two distinct allocations. Comparing their addresses thus is guaranteed to come back false. However, if _1 in callee secretly just references the data in caller, the pointers will compare equal in the real compiled code. This is an incorrect compilation, and I do not have good ideas for fixing that.

If address inequality is something we have to guarantee, then we actually have a real problem here. I unfortunately can't make this actually miscompile today because 1) dest prop is off by default and also broken and 2) MIR building emits extra copies in the cases we care about (for reasons unrelated to this, and in a pretty fragile way). This means that, as far as I can tell, we have three options:

Don't guarantee address inequality in this scenario (by whatever means).
Admit that codegen is wrong. This means that codegen has to emit a copy in the general case. We can certainly add MIR opts/analyses to help it avoid that (we'll probably need an inter-procedural "is address observed" thing sooner or later anyway), but that doesn't materially change the state of things in general.
Refactor MIR so that functions don't take operands, but places, and emit the copies as a part of MIR building. We could also try and figure out a way to do this only in optimization MIR (sounds hard though). The advantage of this version of things is that it would allow us to remove those copies in normal MIR opts. The disadvantage is that its a major MIR refactoring.

If T-Lang won't let us pretend we live in the world of bullet point 1, then I can imagine quite a bit of stuff becoming blocked on this (since it means that eliminating moves in MIR is unsound).

RalfJung commented 2 years ago

Don't guarantee address inequality in this scenario (by whatever means).

At first this sounded ridiculous, but in https://github.com/rust-lang/unsafe-code-guidelines/issues/328 you proposed a scheme that actually does this in a fairly reasonable way. It would mean we can only do this optimization when at least one of the involved places has not been 'exposed', which seems okay and easy enough to figure out.

However, saying different locals can have overlapping addresses has huge ramifications. It could help a lot with some optimizations (since decreasing the liveness range of a variable no longer means we break address inequality guarantees), but I also doubt anyone would actually write code that works when the addr of different allocations could overlap. This sounds like it would break fast-path optimizations such as the one in PartialEq for slices which skips even comparing them when they have the same address and length.

Diggsey commented 2 years ago

This sounds like it would break fast-path optimizations such as the one in PartialEq for slices which skips even comparing them when they have the same address and length.

Are you imagining that it gets the address by a non-exposing cast to an integer? Or are you imagining that the pointers are compared directly? If the latter, and the compiler has somehow optimized two objects to the same numeric address, then the compiler should also ensure that the pointers compare not equal even when the numeric address would be the same.

JakobDegen commented 2 years ago

the compiler has somehow optimized two objects to the same numeric address, then the compiler should also ensure that the pointers compare not equal even when the numeric address would be the same.

This would be great if we could do it, but it's not really clear how. It doesn't fit into the model I proposed in any nice way.

However, saying different locals can have overlapping addresses has huge ramifications. It could help a lot with some optimizations (since decreasing the liveness range of a variable no longer means we break address inequality guarantees), but I also doubt anyone would actually write code that works when the addr of different allocations could overlap

Yeah, this is a problem. It's more general than this particular situation though, given the lack of satisfactory solutions to the UCG you linked.

However, I think it's useful to remember that we don't need the full power of my proposal to not guarantee address inequality. For example, you proposed in another issue that we could let two locals have the same address if they are not simultaneously live in the sense of MIR move analysis. That would be enough to allow this optimization (if we also deinit on moves and treat them as a write to the underlying bytes, which I am increasingly convinced we should do)

RalfJung commented 2 years ago

Are you imagining that it gets the address by a non-exposing cast to an integer? Or are you imagining that the pointers are compared directly? If the latter, and the compiler has somehow optimized two objects to the same numeric address, then the compiler should also ensure that the pointers compare not equal even when the numeric address would be the same.

I am imagining that comparing pointers at pointer type should be equivalent to ptr1.addr() == ptr2.addr(). I don't think we can afford distinguishing between those operations, that is just too surprising.

However, I think it's useful to remember that we don't need the full power of my proposal to not guarantee address inequality. For example, you proposed in another issue that we could let two locals have the same address if they are not simultaneously live in the sense of MIR move analysis. That would be enough to allow this optimization (if we also deinit on moves and treat them as a write to the underlying bytes, which I am increasingly convinced we should do)

Would it? How would the StorageLive/StorageDead make that work? Deinit doesn't affect address equality, only (de)allocation does (and Storage* is (de)allocation for stack variables).

JakobDegen commented 2 years ago

Would it? How would the StorageLive/StorageDead make that work? Deinit doesn't affect address equality, only (de)allocation does (and Storage* is (de)allocation for stack variables).

I was referring to your suggestion here. As far as I understand it, that suggestion is "immediately after constructing MIR we are allowed to shrink storage ranges to actual live ranges," which seems to be exactly what we need.

RalfJung commented 2 years ago

Yeah but I don't see how that helps for this optimization. We cannot declare _1 dead before the call since it is used as an argument to the call.

JakobDegen commented 2 years ago

We can declare the order to be:

Argument values are computed (via a load in the case of move arguments)
Control flow enters the callee
The callee allocates memory for the parameters
The parameter values are stored to the new allocations

In that case the optimization is allowed because the parameter allocations are not live until after the last step, and the argument allocations are dead after step 1. This is... admittedly a bit subtle, but it should totally work. Also, given that we'll probably additionally have somewhat surprising ordering around the return place computation (to disallow overlaps), I don't think this makes things that much worse.

RalfJung commented 2 years ago

We should be able to directly represent this in the MIR though, by placing a StorageDead in a suitable location -- which seems hard with this proposal?

JakobDegen commented 2 years ago

Oh hmm, I see what you mean. My assumption had been that with this proposal we can basically get rid of Storage* statements and just replace them with liveness analysis, but that's not quite true; it's only the case at MIR building time.

RalfJung commented 1 year ago

One aspect that hasn't come up yet is the evaluation order wrt the return place: in *ptr = foo(...);, does it use the pointer value before or after the call? Miri says before and I hope that's also what codegen does. It should be fairly uncontroversial IMO except that in surface Rust we evaluate the RHS twice (but MIR building introduces temporaries so that MIR itself can have a different eval order).

cbeuw commented 1 year ago

In custom MIR one could observe whether the codegen has used the same memory location for caller destination and callee return slot through pointer comparison. fn1 is supplied with a pointer pointing to the destination x, and inside fn1 this is compared with a pointer to RET. Miri prints false for this and LLVM prints true.

This is not reproducible in normal Rust for two reasons: one is that a temporary is always generated as the destination before a Call terminator, so there's no way to get a pointer to it. Another is that inside the callee you can no longer get &raw _0 now that NVRO is disabled in https://github.com/rust-lang/rust/pull/111007 (though the latter is still possible on current stable and if it's reenabled in the future)

#![feature(custom_mir, core_intrinsics)]
extern crate core;
use core::intrinsics::mir::*;
#[custom_mir(dialect = "runtime", phase = "initial")]
pub fn fn0() -> bool {
    mir! {
    let x: (u64, bool, u8, bool);
    let ptr: *const (u64, bool, u8, bool);
    {
    x = (1, true, 2, true);
    ptr = core::ptr::addr_of!(x);
    Call(x, bb1, fn2(ptr))
    }
    bb1 = {
    RET = (*ptr).1;
    Return()
    }

    }
}

#[custom_mir(dialect = "runtime", phase = "initial")]
pub fn fn2(dst_ptr: *const (u64, bool, u8, bool)) -> (u64, bool, u8, bool) {
    mir! {
    type RET = (u64, bool, u8, bool);
    let ret_ptr: *const (u64, bool, u8, bool);
    {
    RET = (1, true, 2, true);

    ret_ptr = core::ptr::addr_of!(RET);
    RET.1 = dst_ptr == ret_ptr;
    Return()
    }

    }
}

pub fn main() {
    println!("{}", fn0());
}

bjorn3 commented 1 year ago

This depends on the return type and calling convention. If the calling convention says the return value has to be returned using an out pointer, you get the observed behavior of the address being identical. If the calling convention says it has to be returned in a register, you get two different addresses. Same applies to function arguments.

RalfJung commented 1 year ago

I think on the Rust level we probably want to treat this as non-deterministic. The question is how to best do that, given that these are two distinct allocations in a straight-forward definition of what happens at a function call.

Maybe we need more of that "two distinct allocations at the same address" trickery...

JakobDegen commented 1 year ago

I'm losing track of this conversation a bit, it seems like we're discussing at least three different things here:

The first is surface Rust evaluation order, which Ralf shared an example for here. I don't think we'd be able to change this even if we wanted to, so I don't think it's worth much discussion. In any case, I also think that what Rust does today is probably the right thing anyway.

The second is the semantics of a surface Rust version of the program Andy wrote above. It's hard to write such a program because Rust has no native concept of a return place, but it might look something like this:


static ADDR: usize = 0;

fn get_return_address() -> usize {
    let x = 0;
    ADDR = addr_of!(x) as usize;
    return x;
}

fn main() {
    let mut x = 0;
    let a = addr_of!(x) as usize;
    x = get_return_address();
    dbg!(a == ADDR);
}

I don't know how to write an opsem in which this program is allowed to print true. It would maybe be nice if we could. We should discuss this question on UCG too though.

The third thing that is being discussed here (and the one which is actually on topic for this issue) is the one about Mir semantics. On this I have three thoughts:

a. The behavior of Miri today is certainly not wrong for surface Rust programs as a result of Mir building introducing lots of temporaries. It is possible that Miri doesn't report UB on some programs on which it maybe should (ie a Mir version of Ralf's zulip example). That might be worth fixing but it's not a huge deal. b. What codegen does today should eventually be fixed. Even if the address equality issue wasn't there, there are still other ways to observe that the callee can write to the destination place before the call returns. At the end of the day what it is is a crappy attempt to implement something like destprop in codegen. It also means that destprop ends up having to do extra work to turn itself off so that it doesn't miscompile cases like the one I wrote above. I can't justify disabling this behavior right now, but I'm very much looking to do so in the future. At the very least, Lir will only get these semantics over my dead body (this is also true because even if we had a more reasonable representation, these aren't even the semantics an optimizer wants). c. Once we fix this behavior in codegen and have a reasonable way to get those optimizations which we agree are actually sound, we should just change Mir semantics to match surface Rust. That sounds like the most straightforward solution to making most of these problems go away.

RalfJung commented 1 year ago

Yeah I've been using this as a grab-bag issue for all things related to fn calls. Some of these should probably get their own UCG threads.

One main thing we discussed here in the past is the interaction of move operands and function calls. Function calls seem to be one of the main motivation for move operands being special, so if we want to replace move operands by something else (which I think we should), then function calls will show us some key design constraints.

RalfJung commented 1 year ago

Thanks to @cbeuw for some examples using custom MIR that demonstrate the current behavior of LLVM here.

I particularly worry about this one: in a function like

pub unsafe fn write(mut ptr_to_non_copy: *mut (usize, i32, f64), mut non_copy: Inner) {

we see that ptr_to_non_copy and addr_of_mut!(non_copy) can be the same pointer! This is hard to reconcile with the idea that each function argument is allocated as a fresh place when the function starts.

If we view move as a boolean flag associated with each function argument, the best attempt I can come up with at explaining this is something like this:

First we evaluate all function arguments to their value (as in, MiniRust Value)
Then we deallocate the backing store of all move arguments (only makes sense if those are locals, we should reject any other place expression)
Then we allocate backing store for the callee and write these values in there

What about other places to move from? @bjorn3 said in the past we should allow at least *local (for passing unsized arguments, IIRC). I guess if we specifically say that local must be a Box in that case we can still say that we will deallocate the box in step (2).

rust-lang / rust

Semantics of MIR function calls #71117