llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.15k stars 12.03k forks source link

Missed optimization: xor eax, eax is unnecessary on x86-64 #111728

Open AreaZR opened 1 month ago

AreaZR commented 1 month ago
define i64 @tgt2(i64 %3) {
entry:          
  %4 = icmp eq i64 %3, 0
  %5 = sext i1 %4 to i64
  ret i64 %5
}

compiles to:

tgt2:                                   # @tgt2
        xor     eax, eax
        cmp     rdi, 1
        sbb     rax, rax
        ret

For x86-64.

This has a completely unnecessary xor eax, eax at the start

I assume the function is simply -!arg there

https://godbolt.org/z/zaPY53YT1

The sbb rax, rax renders it so.

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-x86

Author: Rose (AreaZR)

define i64 @tgt2(i64 %3) { entry: %4 = icmp eq i64 %3, 0 %5 = sext i1 %4 to i64 ret i64 %5 } compiles to: tgt2: # @tgt2 xor eax, eax cmp rdi, 1 sbb rax, rax ret For x86-64. This has a completely unnecessary xor eax, eax at the start I assume the function is simply -!arg there https://godbolt.org/z/zaPY53YT1 The sbb rax, rax renders it so.
topperc commented 1 month ago

sbb has a false dependency on the previous value of eax on Intel CPUs. This might be trying to break that dependency. See also https://github.com/llvm/llvm-project/issues/47201

rilysh commented 1 month ago

godbolt testcase: https://godbolt.org/z/bY9f87ofx MSVC and GCC don't seem to address this yet

boomanaiden154 commented 1 month ago

Passing -mtune=znver4 (or any other arch that has SBB64rr marked as a dependency breaking instruction in the scheduling model) gets rid of the extra xor.

Given this is fundamentally a performance optimization, I'm wondering if it's worth preventing the emission of the extra dependency breaking instruction when optimizing for size?