llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.52k stars 11.79k forks source link

[aarch64] llvm.experimental.reduce.{and, any} don't lower properly for boolean vectors #40980

Open RKSimon opened 5 years ago

RKSimon commented 5 years ago
Bugzilla Link 41635
Version trunk
OS Windows NT
CC @alexey-bataev,@Arnaud-de-Grandmaison-ARM,@gnzlbg,@nikic,@smithp35
Fixed by commit(s) 360054

Extended Description

Split off from [Bug #​36702], AARCH64 generates poor code for boolean reduction from generic IR - either with the llvm.experimental.vector.reduce intrinsics (which expand to a shuffle reduction chain) or with bitcasts of the comparison result mask:

https://godbolt.org/z/wchsJ6

e.g.

AArch64

LLVM6:

all_8x8: ldr d0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] tst w8, #​0xff umov w10, v0.b[2] cset w8, ne tst w9, #​0xff cset w9, ne tst w10, #​0xff umov w10, v0.b[3] and w8, w8, w9 cset w9, ne tst w10, #​0xff umov w10, v0.b[4] and w8, w9, w8 cset w9, ne tst w10, #​0xff umov w10, v0.b[5] and w8, w9, w8 cset w9, ne tst w10, #​0xff umov w10, v0.b[6] and w8, w9, w8 cset w9, ne tst w10, #​0xff umov w10, v0.b[7] and w8, w9, w8 cset w9, ne tst w10, #​0xff and w8, w9, w8 cset w9, ne and w0, w9, w8 ret any_8x8: ldr d0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] orr w8, w8, w9 umov w9, v0.b[2] orr w8, w8, w9 umov w9, v0.b[3] orr w8, w8, w9 umov w9, v0.b[4] orr w8, w8, w9 umov w9, v0.b[5] orr w8, w8, w9 umov w9, v0.b[6] orr w8, w8, w9 umov w9, v0.b[7] orr w8, w8, w9 tst w8, #​0xff cset w0, ne ret

Manually generated:

all_8x8: ldr d0, [x0] mov v0.d[1], v0.d[0] uminv b0, v0.16b fmov w8, s0 tst w8, #​0xff cset w0, ne ret any_8x8: ldr d0, [x0] mov v0.d[1], v0.d[0] umaxv b0, v0.16b fmov w8, s0 tst w8, #​0xff cset w0, ne ret

alexey-bataev commented 3 years ago

Ping, there is https://reviews.llvm.org/D97961, which changes the default cost for the logical reductions.

nikic commented 5 years ago

The vector.reduce cases are fixed by https://reviews.llvm.org/rL360054. The bitcast part is still open.

RKSimon commented 5 years ago

Candidate Patch: https://reviews.llvm.org/D61398

nikic commented 5 years ago

As a target-independent transform, we should convert i1 VECREDUCE_AND and VECREDUCE_OR into VECREDUCE_UMIN and VECREDUCE_UMAX respectively, if it's favorable legality-wise.

Not sure where the right place to do that would be though.