match-then-remake `Result` doesn't optimize away for some payload widths

scottmcm commented 2 years ago

As the new codegen test confirms, matching and rebuilding a Result<i32, u32> optimizes away now (on x64).

However, it seemingly doesn't optimize away if the ok and error types are 64 bits:

pub fn result_nop_match_64(x: Result<i64, u64>) -> Result<i64, u64> {
    match x {
        Ok(x) => Ok(x),
        Err(x) => Err(x),
    }
}

https://rust.godbolt.org/z/fMn6aqWG8

result_nop_match_64:
        mov     rax, rdi
        cmp     qword ptr [rsi], 0
        je      .LBB0_1
        mov     rcx, qword ptr [rsi + 8]
        mov     qword ptr [rax + 8], rcx
        mov     ecx, 1
        mov     qword ptr [rax], rcx
        ret
.LBB0_1:
        mov     rcx, qword ptr [rsi + 8]
        mov     qword ptr [rax + 8], rcx
        xor     ecx, ecx
        mov     qword ptr [rax], rcx
        ret

Interestingly, it does optimize away when there's another enum in the middle, like happens in the ? desugaring. From that same godbolt,

pub fn result_nop_traits_64(x: Result<i64, u64>) -> Result<i64, u64> {
    try {
        x?
    }
}

gives

result_nop_traits_64:
        mov     rax, rdi
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        ret

nikic commented 2 years ago

This is another casualty of LLVM's dumb type-based GEP representation. Both branches write the same values to the same locations, but LLVM doesn't realize this due to the nominally different GEPs. https://github.com/rust-lang/rust/pull/98615 would probably fix this.

the8472 commented 2 years ago

This also works

pub fn result_nop_match_64(x: Result<i64, u64>) -> Result<i64, u64> {
    match x {
        val @ Ok(_) => val,
        val @ Err(_) => val,
    }
}

scottmcm commented 2 years ago

@the8472 That doesn't really surprise me, since in that the matching is completely unused -- it's basically just let val = x; val, and thus has worked since even before LLVM 15: https://rust.godbolt.org/z/G45rx9a3d

(Once it's translated to SSA form, it'll be basically %return = phi [%x,%bb1], [%x,%bb2], which instcombine can trivially turn into %return = %x without needing to make contextual decisions, and then the rest of the stuff is clearly dead.)

JakobDegen commented 2 years ago

I'm glad that LLVM is fixing this. This case can be covered by MIR optimizations, but doing it soundly turns out to be quite tricky. Discussion of the exact semantics surrounding this can be found here, but the tldr is that under the model that Ralf currently prefers (I am not convinced yet), turning

let val = match some_place {
    Ok(x) => Ok(x),
    Err(x) => Err(x),
};

into let val = some_place; is unsound in the general case - the problem is that the latter introduces a read of any padding bytes in some_place which was not there before. As such, the optimization requires proving that such a read is ok to introduce.

For the case in the original issue, this can be proven by showing that there are no valid pointers to those bytes at the time of the read (if we had SSA this would also be straightforward, but that seems far off). If we instead were to write something like this (note the reference):

pub fn result_nop_match_64(x: &Result<i64, u64>) -> Result<i64, u64> {
    match *x {
        Ok(x) => Ok(x),
        Err(x) => Err(x),
    }
}

This can be proven sound under current SB. Neither of these analyses results are easily available in MIR right now though.

the8472 commented 1 year ago

The issue appears fixed on nightly: https://rust.godbolt.org/z/6o7TPqjbq

So this can probably be closed after updating the codegen test.

c410-f3r commented 1 year ago

Nice! It's been a long way since https://github.com/rust-lang/rust/issues/37939.

If no one shows up, I can create the tests.

WaffleLapkin commented 1 year ago

@the8472 interestingly, the comment at the top is (partially) outdated — 128 bit integers work just fine! Whoever will be adding tests, please add tests for them too (see https://rust.godbolt.org/z/W4YffGx4q).

However, Option is still not looking as it should, although this time it's the opposite to this issue — match is optimal while try { x? } is suboptimal. Should we open a new issue for that?

P.S. why does llvm-ir still look so complicated? do the optimizations happen on the instruction level, after llvm-ir? P.P.S. why does adding an llvm-ir window completely screw up the asm output again (see here)? is this a godbolt bug or something?...

veber-alex commented 1 year ago

I encountered an issue that may be related to this one. I was profiling some code and Option::or showed up in the profile which is weird.

Looking at the code it seems that suboptimal code is generated for some T in Option<T>. The weirdest thing is that u8 optimizes well but [u8; 1] doesn't.

godbolt

scottmcm commented 7 months ago

There are some sample codegen tests for this in https://github.com/rust-lang/rust/pull/100693/files#diff-68a12d31335d7f061169500381ebaf9634034cf419332c16512355df8c8cfddeR16.

Since this is now just needs-test, it should be an easy PR to add the previously-problematic widths from the OP as new codegen tests similar to those other ones.

https://github.com/rust-lang/rust/issues/101210#issuecomment-1732470941 looks different, so I've made a new #124533 for it

heiseish commented 2 months ago

Hey @scottmcm, I can open a PR if this is not assigned yet. Just to confirm, we want to add tests for the original issue what has already been fixed in nightly? Thanks!

jieyouxu commented 1 month ago

I can open a PR if this is not assigned yet. Just to confirm, we want to add tests for the original issue what has already been fixed in nightly? Thanks!

@heiseish Yes, but as Waffle mentioned please also include test for 128 bit widths https://rust.godbolt.org/z/W4YffGx4q

heiseish commented 1 month ago

@rustbot claim

rust-lang / rust

match-then-remake `Result` doesn't optimize away for some payload widths #101210