llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.15k stars 12.03k forks source link

SimplifyCFG destroying canonical loop structure #32952

Open llvmbot opened 7 years ago

llvmbot commented 7 years ago
Bugzilla Link 33605
Version trunk
OS Windows NT
Attachments test.ll
Reporter LLVM Bugzilla Contributor

Extended Description

In attached test case, there a loop latch block whose terminator is an unconditional branch. When simplifyCFG simplifies the latch block it folds it into the loop header. This destroys the canonical loop structure as it results in multiple backedges. This prevents later optimizations from being applied (e.g. loop unroll and loop vectorization). As in this example, LoopSimplify turns it into a nested loop because there are multiple backedges and now this loop cannot be unrolled.

The attached testcase(test.ll) is from this c code: $cat test.c void foo(); bool test(int a, int b, int *c) { bool changed = false; for (unsigned int i = 2; i--;) { int r = a | b; if ( r != c[i]) { c[i] = r; foo(); changed = true; } } return changed; }

With simplifyCFG:

$ opt < test.ll -simplifycfg -loop-rotate -loop-unroll -debug -disable-output Args: opt -simplifycfg -loop-rotate -loop-unroll -debug -disable-output Looking to fold if.end into for.cond Killing Trivial BB:

if.end: ; preds = %if.then, %for.body %changed.1.off0 = phi i1 [ true, %if.then ], [ %changed.0.off0, %for.body ] br label %for.cond LoopSimplify: Splitting out a new outer loop Loop Unroll: F[test] Loop %for.cond Loop Size = 9 will not try to unroll loop with runtime trip count -unroll-runtime not given Loop Unroll: F[test] Loop %for.cond.outer Loop Size = 13 will not try to unroll loop with runtime trip count -unroll-runtime not given

Without simplifyCFG:

$ opt < new.ll -loop-rotate -loop-unroll -debug -disable-output Args: opt -loop-rotate -loop-unroll -debug -disable-output LoopRotation: rotating Loop at depth 1 containing: %for.cond

,%for.body,%if.then,%if.end Inserted PHI: %changed.0.off01 = phi i1 [ false, %entry ], [ %changed.0.off0, %for.cond ] Inserted PHI: %dec2 = phi i32 [ 1, %entry ], [ %dec, %for.cond ] LoopRotation: into Loop at depth 1 containing: %for.body
,%if.then,%if.end Loop Unroll: F[test] Loop %for.body Loop Size = 12 Trip Count = 2 Trip Multiple = 2 COMPLETELY UNROLLING loop %for.body with trip count 2! Merging: for.body.1: ; preds = %if.end %or.1 = or i32 %a, %b %idxprom.1 = sext i32 %dec to i64 %arrayidx.1 = getelementptr inbounds i32, i32 %c, i64 %idxprom.1 %1 = load i32, i32 %arrayidx.1, align 4 %cmp.1 = icmp eq i32 %or.1, %1 br i1 %cmp.1, label %if.end.1, label %if.then.1 into: if.end: ; preds = %if.then, %for.body %changed.1.off0 = phi i1 [ true, %if.then ], [ false, %for.body ] %dec = add nsw i32 1, -1 %tobool = icmp eq i32 1, 0 br label %for.body.1 Merging: for.cond.cleanup: ; preds = %if.end.1 %changed.0.off0.lcssa = phi i1 [ %changed.1.off0.1, %if.end.1 ] ret i1 %changed.0.off0.lcssa into: if.end.1: ; preds = %if.then.1, %if.end %changed.1.off0.1 = phi i1 [ true, %if.then.1 ], [ %changed.1.off0, %if.end ] %dec.1 = add nsw i32 %dec, -1 %tobool.1 = icmp eq i32 %dec, 0 br label %for.cond.cleanup

llvmbot commented 7 years ago

Posted a patch for review here: https://reviews.llvm.org/D35411

llvmbot commented 7 years ago

Hi, we see the same performance regressions in our PACXX research project. We have a reduction code that shows a 5 times slowdown with current LLVM svn compared to a version from around 24th of May. The main part of the reduction is a for loop that was unrolled by -O3 with a factor of 4 and with current llvm now unrolling occurs.