Open RKSimon opened 3 years ago
We don't even fold:
define <4 x i32> @reverse_add_basic(<4 x i32> %a0, <4 x i32> %a1) {
%r0 = shufflevector <4 x i32> %a0, <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
%ss = add <4 x i32> %r0, %a1
%r1 = shufflevector <4 x i32> %ss, <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
ret <4 x i32> %r1
}
to:
define <4 x i32> @reverse_add_basic(<4 x i32> %a0, <4 x i32> %a1) {
%r1 = shufflevector <4 x i32> %a1, <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
%ss = add <4 x i32> %a0, %r1
ret <4 x i32> %ss
}
let alone anything that contains zext/trunc/intrinsics....
[Bug #46238] is sort of related to intrinsics handling
I think instcombine already has something to deal with this kind of problem, perhaps it doesn't handle ext/trunc/saturating math?
CC @zhengyang92
Extended Description
https://godbolt.org/z/dWTd9e
By decrementing the pointer in the loop we end up with loop bodies like this:
AFAICT we should be able to remove both these 'reverse' shufflevectors.
I'm not sure if we should be trying to fix this in InstCombine/VectorCombine or inside the LoopVectorizer.