rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.7k stars 12.49k forks source link

Inefficient codegen with leading/ trailing zeros result manipulation. #74281

Open shampoofactory opened 4 years ago

shampoofactory commented 4 years ago

As demonstrated in the two cases below, the compiler fails to produce optimal code for the equal branch. I'm assuming that it doesn't recognize the 64 as a constant and thus misses the opportunity to perform the stated manipulations at compile time.

https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=d0c7af74498a9db149fa92c1b0c4d178

pub fn case_1(x: u64) -> u32 {
    x.leading_zeros() / 8 + 1
}

pub fn case_2(x: u64) -> u32 {
    x.trailing_zeros() / 8 + 1
}

Generated assembly:

playground::case_1: # @playground::case_1
# %bb.0:
    testq   %rdi, %rdi
    je  .LBB0_1
# %bb.2:
    bsrq    %rdi, %rax
    xorq    $63, %rax
    shrl    $3, %eax
    addl    $1, %eax
    retq

.LBB0_1:
    movl    $64, %eax
    shrl    $3, %eax
    addl    $1, %eax
    retq

Instead I would expect:

playground::case_1: # @playground::case_1
# %bb.0:
    testq   %rdi, %rdi
    je  .LBB0_1
# %bb.2:
    bsrq    %rdi, %rax
    xorq    $63, %rax
    shrl    $3, %eax
    addl    $1, %eax
    retq

.LBB0_1:
    movl    $9, %eax
    retq

case_2 output using trailing_zeros is comparable.

nikic commented 4 years ago

Godbolt: https://llvm.godbolt.org/z/jWonTG

I'd guess that this is only getting custom lowered after the last DAG combine.