Open RKSimon opened 5 years ago
Ping, there is https://reviews.llvm.org/D97961, which changes the default cost for the logical reductions.
The vector.reduce cases are fixed by https://reviews.llvm.org/rL360054. The bitcast part is still open.
Candidate Patch: https://reviews.llvm.org/D61398
As a target-independent transform, we should convert i1 VECREDUCE_AND and VECREDUCE_OR into VECREDUCE_UMIN and VECREDUCE_UMAX respectively, if it's favorable legality-wise.
Not sure where the right place to do that would be though.
Extended Description
Split off from [Bug #36702], AARCH64 generates poor code for boolean reduction from generic IR - either with the llvm.experimental.vector.reduce intrinsics (which expand to a shuffle reduction chain) or with bitcasts of the comparison result mask:
https://godbolt.org/z/wchsJ6
e.g.
AArch64
LLVM6:
all_8x8: ldr d0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] tst w8, #0xff umov w10, v0.b[2] cset w8, ne tst w9, #0xff cset w9, ne tst w10, #0xff umov w10, v0.b[3] and w8, w8, w9 cset w9, ne tst w10, #0xff umov w10, v0.b[4] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[5] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[6] and w8, w9, w8 cset w9, ne tst w10, #0xff umov w10, v0.b[7] and w8, w9, w8 cset w9, ne tst w10, #0xff and w8, w9, w8 cset w9, ne and w0, w9, w8 ret any_8x8: ldr d0, [x0] umov w8, v0.b[0] umov w9, v0.b[1] orr w8, w8, w9 umov w9, v0.b[2] orr w8, w8, w9 umov w9, v0.b[3] orr w8, w8, w9 umov w9, v0.b[4] orr w8, w8, w9 umov w9, v0.b[5] orr w8, w8, w9 umov w9, v0.b[6] orr w8, w8, w9 umov w9, v0.b[7] orr w8, w8, w9 tst w8, #0xff cset w0, ne ret
Manually generated:
all_8x8: ldr d0, [x0] mov v0.d[1], v0.d[0] uminv b0, v0.16b fmov w8, s0 tst w8, #0xff cset w0, ne ret any_8x8: ldr d0, [x0] mov v0.d[1], v0.d[0] umaxv b0, v0.16b fmov w8, s0 tst w8, #0xff cset w0, ne ret