rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.99k stars 12.79k forks source link

Missed optimization when looping over bytes of a value #133528

Open theemathas opened 2 days ago

theemathas commented 2 days ago

I tried this code, which contains 3 functions which check if all the bits in a u64 are all ones:

#[no_mangle]
fn ne_bytes(input: u64) -> bool {
    let bytes = input.to_ne_bytes();
    bytes.iter().all(|x| *x == !0)
}

#[no_mangle]
fn black_box_ne_bytes(input: u64) -> bool {
    let bytes = input.to_ne_bytes();
    let bytes = std::hint::black_box(bytes);
    bytes.iter().all(|x| *x == !0)
}

#[no_mangle]
fn direct(input: u64) -> bool {
    input == !0
}

I expected to see this happen: ne_bytes() should be optimized to the same thing as direct(), while black_box_ne_bytes() should be optimized slightly worse

Instead, this happened: I got the following assembly, where ne_bytes() is somehow optimized worse than black_box_ne_bytes()

ne_bytes:
        mov     rax, rdi
        not     rax
        shl     rax, 8
        sete    cl
        shr     rdi, 56
        cmp     edi, 255
        setae   al
        and     al, cl
        ret

black_box_ne_bytes:
        mov     qword ptr [rsp - 8], rdi
        lea     rax, [rsp - 8]
        cmp     qword ptr [rsp - 8], -1
        sete    al
        ret

direct:
        cmp     rdi, -1
        sete    al
        ret

Godbolt

Meta

Reproducible on godbolt with stable rustc 1.82.0 (f6e511eec 2024-10-15) and nightly rustc 1.85.0-nightly (7db7489f9 2024-11-25)

purplesyringa commented 2 days ago

As far as I can see, something very similar is at least partially fixed on LLVM trunk: https://godbolt.org/z/MzqG7rf9d. There's also another similar issue: https://github.com/llvm/llvm-project/issues/117853, but I'm not sure if it's relevant to this particular issue.

purplesyringa commented 2 days ago

@rustbot label +A-LLVM

clubby789 commented 1 day ago

Looks like all 3 functions optimize to the same thing on LLVM trunk opt