void f(unsigned int * __restrict a, unsigned int * __restrict b)
{
unsigned int t = 0;
t += -(a[0] == b[0]);
t += -(a[1] == b[1]);
t += -(a[2] == b[2]);
t += -(a[3] == b[3]);
a+=4; b+=4;
*a = t;
}
Where LCPI0_0 is {1,2,4,8}.
What is it is trying to do is make a bit mask for the comparison and then count how many bits are set.
Why not instead just do:
Take:
```
void f(unsigned int * __restrict a, unsigned int * __restrict b)
{
unsigned int t = 0;
t += -(a[0] == b[0]);
t += -(a[1] == b[1]);
t += -(a[2] == b[2]);
t += -(a[3] == b[3]);
a+=4; b+=4;
*a = t;
}
```
LLVM produces currently:
```
f:
ldr q0, [x1]
ldr q1, [x0]
adrp x8, .LCPI0_0
cmeq v0.4s, v1.4s, v0.4s
ldr q1, [x8, :lo12:.LCPI0_0]
and v0.16b, v0.16b, v1.16b
addv s0, v0.4s
fmov w8, s0
and w8, w8, #0xf
fmov s0, w8
cnt v0.8b, v0.8b
uaddlv h0, v0.8b
fmov w8, s0
neg w8, w8
str w8, [x0, #16]
ret
```
Where LCPI0_0 is {1,2,4,8}.
What is it is trying to do is make a bit mask for the comparison and then count how many bits are set.
Why not instead just do:
```
f:
ldr q0, [x1]
ldr q1, [x0]
cmeq v0.4s, v1.4s, v0.4s
addv s0, v0.4s
str s0, [x0, #16]
ret
```
Take:
LLVM produces currently:
Where LCPI0_0 is {1,2,4,8}. What is it is trying to do is make a bit mask for the comparison and then count how many bits are set. Why not instead just do: