llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.16k stars 12.03k forks source link

missing SCEV support for xor in loops #17507

Open ec04fc15-fa35-46f2-80e1-5d271f2ef708 opened 11 years ago

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 11 years ago
Bugzilla Link 17133
Version trunk
OS Linux

Extended Description

int x; void f() { for (int n = 0; n < 500000000; ++n) x ^= 1; }

With -O3 -fno-vectorize, we produce

define void @​_Z1fv() #​0 { %x.promoted = load i32* @​x, align 4, !tbaa !​0 br label %1

;

;

With -O3 -fvectorize, we produce

define void @​_Z1fv() #​0 { vector.ph: %x.promoted = load i32* @​x, align 4, !tbaa !​0 %0 = insertelement <4 x i32> <i32 undef, i32 0, i32 0, i32 0>, i32 %x.promoted, i32 0 br label %vector.body

vector.body: ; preds = %vector.body, %vector.ph %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] %vec.phi = phi <4 x i32> [ %0, %vector.ph ], [ %1, %vector.body ] %vec.phi2 = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %2, %vector.body ] %1 = xor <4 x i32> %vec.phi, <i32 1, i32 1, i32 1, i32 1> %2 = xor <4 x i32> %vec.phi2, <i32 1, i32 1, i32 1, i32 1> %index.next = add i32 %index, 8 %3 = icmp eq i32 %index.next, 500000000 br i1 %3, label %middle.block, label %vector.body

middle.block: ; preds = %vector.body %bin.rdx = xor <4 x i32> %2, %1 %rdx.shuf = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> %bin.rdx5 = xor <4 x i32> %bin.rdx, %rdx.shuf %rdx.shuf6 = shufflevector <4 x i32> %bin.rdx5, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx7 = xor <4 x i32> %bin.rdx5, %rdx.shuf6 %4 = extractelement <4 x i32> %bin.rdx7, i32 0 store i32 %4, i32* @​x, align 4, !tbaa !​0 ret void }

This is a success for the vectorizer but a failure for LLVM as a whole. SCEV should be able to determine that 'x' is unchanged by this loop and we should be able to remove it.

llvmbot commented 11 years ago

It might be reasonable to teach SCEV to handle xor by constant and xor by loop-invariant. xor by add-rec-expr is too crazy for me.

But setting SCEV aside for a second there's something else important here. This loop doesn't continue to evolve past two iterations. Consider that we do loop unrolling at the IR. If we unrolled one iteration into two iterations, this example would work. If we had good loop dependence analysis, we should in theory be able to detect that this loop stops evolving after 2 iterations and then unroll it by 2. (IMO, that's nearly the only reason to do unrolling in the IR, the bulk of unrolling should be done in the backend.)