llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.83k stars 11.46k forks source link

Failure to remove before/after reverse shuffles from loop with decrementing iterator #48960

Open RKSimon opened 3 years ago

RKSimon commented 3 years ago
Bugzilla Link 49616
Version trunk
OS Windows NT
CC @anton-afanasyev,@fhahn,@LebedevRI,@rotateright

Extended Description

https://godbolt.org/z/dWTd9e

void loop_dec(unsigned short *p, int max, int n) {
    for (int i = 0; i < (n & 31); i++) {
        unsigned m = *--p;
        *p = (unsigned short)(m >= max ? m-max : 0);
    }
}

By decrementing the pointer in the loop we end up with loop bodies like this:

52:
  %53 = phi i64 [ %41, %40 ], [ %64, %52 ]
  %54 = sub i64 0, %53
  %55 = getelementptr inbounds i16, i16* %51, i64 %54
  %56 = bitcast i16* %55 to <4 x i16>*
  %57 = load <4 x i16>, <4 x i16>* %56, align 2, !tbaa !2
  %58 = shufflevector <4 x i16> %57, <4 x i16> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %59 = zext <4 x i16> %58 to <4 x i32>
  %60 = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> %59, <4 x i32> %50)
  %61 = trunc <4 x i32> %60 to <4 x i16>
  %62 = shufflevector <4 x i16> %61, <4 x i16> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %63 = bitcast i16* %55 to <4 x i16>*
  store <4 x i16> %62, <4 x i16>* %63, align 2, !tbaa !2
  %64 = add i64 %53, 4
  %65 = icmp eq i64 %64, %45
  br i1 %65, label %66, label %52, !llvm.loop !9 

AFAICT we should be able to remove both these 'reverse' shufflevectors.

I'm not sure if we should be trying to fix this in InstCombine/VectorCombine or inside the LoopVectorizer.

RKSimon commented 3 years ago

https://godbolt.org/z/ePMEeT

We don't even fold:

define <4 x i32> @reverse_add_basic(<4 x i32> %a0, <4 x i32> %a1) {
  %r0 = shufflevector <4 x i32> %a0, <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %ss = add <4 x i32> %r0, %a1
  %r1 = shufflevector <4 x i32> %ss, <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  ret <4 x i32> %r1
}

to:

define <4 x i32> @reverse_add_basic(<4 x i32> %a0, <4 x i32> %a1) {
  %r1 = shufflevector <4 x i32> %a1, <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  %ss = add <4 x i32> %a0, %r1
  ret <4 x i32> %ss
}

let alone anything that contains zext/trunc/intrinsics....

[Bug #46238] is sort of related to intrinsics handling

LebedevRI commented 3 years ago

I think instcombine already has something to deal with this kind of problem, perhaps it doesn't handle ext/trunc/saturating math?

RKSimon commented 1 year ago

CC @zhengyang92