llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.08k stars 11.59k forks source link

aarch64: reduction of compare result produces not so good code #81375

Open pinskia opened 7 months ago

pinskia commented 7 months ago

Take:

void f(unsigned int * __restrict a, unsigned int * __restrict b)
{
  unsigned int t = 0;
  t += -(a[0] == b[0]);
  t += -(a[1] == b[1]);
  t += -(a[2] == b[2]);
  t += -(a[3] == b[3]);
  a+=4; b+=4;
  *a = t;
}

LLVM produces currently:

f:
        ldr     q0, [x1]
        ldr     q1, [x0]
        adrp    x8, .LCPI0_0
        cmeq    v0.4s, v1.4s, v0.4s
        ldr     q1, [x8, :lo12:.LCPI0_0]
        and     v0.16b, v0.16b, v1.16b
        addv    s0, v0.4s
        fmov    w8, s0
        and     w8, w8, #0xf
        fmov    s0, w8
        cnt     v0.8b, v0.8b
        uaddlv  h0, v0.8b
        fmov    w8, s0
        neg     w8, w8
        str     w8, [x0, #16]
        ret

Where LCPI0_0 is {1,2,4,8}. What is it is trying to do is make a bit mask for the comparison and then count how many bits are set. Why not instead just do:

f:
        ldr     q0, [x1]
        ldr     q1, [x0]
        cmeq    v0.4s, v1.4s, v0.4s
        addv    s0, v0.4s
        str     s0, [x0, #16]
        ret
llvmbot commented 7 months ago

@llvm/issue-subscribers-backend-aarch64

Author: Andrew Pinski (pinskia)

Take: ``` void f(unsigned int * __restrict a, unsigned int * __restrict b) { unsigned int t = 0; t += -(a[0] == b[0]); t += -(a[1] == b[1]); t += -(a[2] == b[2]); t += -(a[3] == b[3]); a+=4; b+=4; *a = t; } ``` LLVM produces currently: ``` f: ldr q0, [x1] ldr q1, [x0] adrp x8, .LCPI0_0 cmeq v0.4s, v1.4s, v0.4s ldr q1, [x8, :lo12:.LCPI0_0] and v0.16b, v0.16b, v1.16b addv s0, v0.4s fmov w8, s0 and w8, w8, #0xf fmov s0, w8 cnt v0.8b, v0.8b uaddlv h0, v0.8b fmov w8, s0 neg w8, w8 str w8, [x0, #16] ret ``` Where LCPI0_0 is {1,2,4,8}. What is it is trying to do is make a bit mask for the comparison and then count how many bits are set. Why not instead just do: ``` f: ldr q0, [x1] ldr q1, [x0] cmeq v0.4s, v1.4s, v0.4s addv s0, v0.4s str s0, [x0, #16] ret ```