[PowerPC] Failure to optimize (x == 0) ? 0xFF : 0 to addic+subfe instead of cntlzw+srwi+neg

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

27.84k stars 11.47k forks source link

[PowerPC] Failure to optimize (x == 0) ? 0xFF : 0 to addic+subfe instead of cntlzw+srwi+neg #98598

Open GabrielRavier opened 1 month ago

GabrielRavier commented 1 month ago

#include <stdint.h>

uint8_t f(uint8_t x)
{
    return (x == 0) ? -1 : 0;
}

With -O3, GCC outputs the following:

f(unsigned char):
  addic 3,3,-1
  subfe 3,3,3
  rlwinm 3,3,0,0xff
  blr

whereas LLVM outputs this:

f(unsigned char): # @f(unsigned char)
  cntlzw 3, 3
  srwi 3, 3, 5
  neg 3, 3
  clrlwi 3, 3, 24
  blr

...and GCC's sequence is 30-50% faster according to llvm-mca.

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-powerpc

Author: Gabriel Ravier (GabrielRavier)

```cpp #include <stdint.h> uint8_t f(uint8_t x) { return (x == 0) ? -1 : 0; } ``` With `-O3`, GCC outputs the following: ```x86asm f(unsigned char): addic 3,3,-1 subfe 3,3,3 rlwinm 3,3,0,0xff blr ``` whereas LLVM outputs this: ```x86asm f(unsigned char): # @f(unsigned char) cntlzw 3, 3 srwi 3, 3, 5 neg 3, 3 clrlwi 3, 3, 24 blr ``` ...and GCC's sequence is 30-50% faster according to `llvm-mca`.

ecnelises commented 1 month ago

On -mcpu=pwr8 or newer, output codegen uses isel:

cmpwi   3, 0
li 3, 0
li 4, 255
iseleq  3, 4, 3
blr