rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.71k stars 12.64k forks source link

Generated assembly contains a jump to the next instruction #127640

Open ChaiTRex opened 3 months ago

ChaiTRex commented 3 months ago

I tried this code in Godbolt:

#[inline(never)]
pub fn next_back_1(len: &mut u8, bits: &mut u16) -> Option<u8> {
    if *len == 0 {
        None
    } else {
        *len -= 1;
        let result = *bits as u8 & 1;
        *bits >>= 1;
        Some(result)
    }
}

I expected to see this happen: Generated assembly doesn't include jumps to the next instruction.

Instead, this happened: Generated assembly does include a jump to the next instruction.

On x86_64-unknown-linux-gnu (with command line flag -O), notice that the jmp .LBB0_3 line is just above the .LBB0_3 label, and so that instruction can be removed, reducing code size:

example::next_back_1::h48bde1edbc0e4b8a:
        movzx   eax, byte ptr [rdi]
        test    al, al
        je      .LBB0_1
        lea     ecx, [rax - 1]
        mov     byte ptr [rdi], cl
        movzx   ecx, word ptr [rsi]
        mov     edx, ecx
        and     dl, 1
        shr     ecx
        mov     word ptr [rsi], cx
        jmp     .LBB0_3
.LBB0_1:
.LBB0_3:
        test    al, al
        setne   al
        ret

On aarch64-apple-darwin (with command line flags -O --target=aarch64-apple-darwin), the same thing occurs with the b LBB0_3 instruction:

__ZN7example11next_back_117ha5e5b29a87ac44fdE:
        ldrb    w8, [x0]
        cbz     w8, LBB0_2
        sub     w9, w8, #1
        strb    w9, [x0]
        ldrh    w9, [x1]
        mov     x10, x1
        and     w1, w9, #0x1
        lsr     w9, w9, #1
        strh    w9, [x10]
        b       LBB0_3
LBB0_2:
LBB0_3:
        cmp     w8, #0
        cset    w0, ne
        ret

Meta

rustc --version --verbose ``` rustc 1.79.0 (129f3b996 2024-06-10) binary: rustc commit-hash: 129f3b9964af4d4a709d1383930ade12dfe7c081 commit-date: 2024-06-10 host: x86_64-unknown-linux-gnu release: 1.79.0 LLVM version: 18.1.7 ``` ``` rustc 1.80.0-beta.4 (64a1fe671 2024-06-21) binary: rustc commit-hash: 64a1fe67112931359c7c9a222f08fd206255c2b5 commit-date: 2024-06-21 host: x86_64-unknown-linux-gnu release: 1.80.0-beta.4 LLVM version: 18.1.7 ``` ``` rustc 1.81.0-nightly (0c81f94b9 2024-07-10) binary: rustc commit-hash: 0c81f94b9a6207fb1fc080caa83584dea2d71fc6 commit-date: 2024-07-10 host: x86_64-unknown-linux-gnu release: 1.81.0-nightly LLVM version: 18.1.7 ```

Note that I can't reproduce this on my own machine, but I get another bug there (#127641).

erikdesjardins commented 3 months ago

Note that I can't reproduce this on my own machine, but I get another bug there (https://github.com/rust-lang/rust/issues/127641).

rustc -O sets opt-level=2, but cargo build --release sets opt-level=3 by default (https://godbolt.org/z/x7WTa9MWc), which is likely the difference here.

workingjubilee commented 3 months ago

My experiments with trying to cram x86 assembly via asm! and Godbolt into LLVM-MCA can't replicate the "clearly faster" result, so I don't feel confident closing this like I did #127641, but there's some distortion since that isn't very direct.

workingjubilee commented 3 months ago

I would expect -Copt-level=s or -Copt-level=z to remove this forward jump.