Open shampoofactory opened 4 years ago
As demonstrated in the two cases below, the compiler fails to produce optimal code for the equal branch. I'm assuming that it doesn't recognize the 64 as a constant and thus misses the opportunity to perform the stated manipulations at compile time.
https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=d0c7af74498a9db149fa92c1b0c4d178
pub fn case_1(x: u64) -> u32 { x.leading_zeros() / 8 + 1 } pub fn case_2(x: u64) -> u32 { x.trailing_zeros() / 8 + 1 }
Generated assembly:
playground::case_1: # @playground::case_1 # %bb.0: testq %rdi, %rdi je .LBB0_1 # %bb.2: bsrq %rdi, %rax xorq $63, %rax shrl $3, %eax addl $1, %eax retq .LBB0_1: movl $64, %eax shrl $3, %eax addl $1, %eax retq
Instead I would expect:
playground::case_1: # @playground::case_1 # %bb.0: testq %rdi, %rdi je .LBB0_1 # %bb.2: bsrq %rdi, %rax xorq $63, %rax shrl $3, %eax addl $1, %eax retq .LBB0_1: movl $9, %eax retq
case_2 output using trailing_zeros is comparable.
case_2
trailing_zeros
Godbolt: https://llvm.godbolt.org/z/jWonTG
I'd guess that this is only getting custom lowered after the last DAG combine.
As demonstrated in the two cases below, the compiler fails to produce optimal code for the equal branch. I'm assuming that it doesn't recognize the 64 as a constant and thus misses the opportunity to perform the stated manipulations at compile time.
https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=d0c7af74498a9db149fa92c1b0c4d178
Generated assembly:
Instead I would expect:
case_2
output usingtrailing_zeros
is comparable.