llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.66k stars 11.85k forks source link

[arm][neon] llvm.experimental.reduce.{and, any} don't lower properly for boolean vectors #40981

Open RKSimon opened 5 years ago

RKSimon commented 5 years ago
Bugzilla Link 41636
Version trunk
OS Windows NT
CC @alexey-bataev,@gnzlbg,@smithp35

Extended Description

Split off from [Bug #​36702], armv7a generates poor code for boolean reduction from generic IR - either with the llvm.experimental.vector.reduce intrinsics (which expand to a shuffle reduction chain) or with bitcasts of the comparison result mask:

https://godbolt.org/z/U7C4n4

e.g.

ARMv7+NEON

LLVM6:

all_8x8: vmov.i8 d0, #​0x1 vldr d1, [r0] vtst.8 d0, d1, d0 vext.8 d1, d0, d0, #​4 vand d0, d0, d1 vext.8 d1, d0, d0, #​2 vand d0, d0, d1 vdup.8 d1, d0[1] vand d0, d0, d1 vmov.u8 r0, d0[0] and r0, r0, #​1 bx lr any_8x8: vmov.i8 d0, #​0x1 vldr d1, [r0] vtst.8 d0, d1, d0 vext.8 d1, d0, d0, #​4 vorr d0, d0, d1 vext.8 d1, d0, d0, #​2 vorr d0, d0, d1 vdup.8 d1, d0[1] vorr d0, d0, d1 vmov.u8 r0, d0[0] and r0, r0, #​1 bx lr

Manually generated:

all_8x8: vldr d0, [r0] vpmin.u8 d16, d0, d16 vpmin.u8 d16, d16, d16 vpmin.u8 d0, d16, d16 vmov.u8 r0, d0[0] bx lr

any_8x8: vldr d0, [r0] vpmax.u8 d16, d0, d16 vpmax.u8 d16, d16, d16 vpmax.u8 d0, d16, d16 vmov.u8 r0, d0[0] bx lr

alexey-bataev commented 3 years ago

Ping, there is https://reviews.llvm.org/D97961, which changes default cost for the logical reductions.