Open llvmbot opened 7 years ago
Posted a patch for review here: https://reviews.llvm.org/D35411
Hi, we see the same performance regressions in our PACXX research project. We have a reduction code that shows a 5 times slowdown with current LLVM svn compared to a version from around 24th of May. The main part of the reduction is a for loop that was unrolled by -O3 with a factor of 4 and with current llvm now unrolling occurs.
Extended Description
In attached test case, there a loop latch block whose terminator is an unconditional branch. When simplifyCFG simplifies the latch block it folds it into the loop header. This destroys the canonical loop structure as it results in multiple backedges. This prevents later optimizations from being applied (e.g. loop unroll and loop vectorization). As in this example, LoopSimplify turns it into a nested loop because there are multiple backedges and now this loop cannot be unrolled.
The attached testcase(test.ll) is from this c code: $cat test.c void foo(); bool test(int a, int b, int *c) { bool changed = false; for (unsigned int i = 2; i--;) { int r = a | b; if ( r != c[i]) { c[i] = r; foo(); changed = true; } } return changed; }
With simplifyCFG:
$ opt < test.ll -simplifycfg -loop-rotate -loop-unroll -debug -disable-output Args: opt -simplifycfg -loop-rotate -loop-unroll -debug -disable-output Looking to fold if.end into for.cond Killing Trivial BB:
if.end: ; preds = %if.then, %for.body %changed.1.off0 = phi i1 [ true, %if.then ], [ %changed.0.off0, %for.body ] br label %for.cond LoopSimplify: Splitting out a new outer loop Loop Unroll: F[test] Loop %for.cond Loop Size = 9 will not try to unroll loop with runtime trip count -unroll-runtime not given Loop Unroll: F[test] Loop %for.cond.outer Loop Size = 13 will not try to unroll loop with runtime trip count -unroll-runtime not given
Without simplifyCFG:
$ opt < new.ll -loop-rotate -loop-unroll -debug -disable-output Args: opt -loop-rotate -loop-unroll -debug -disable-output LoopRotation: rotating Loop at depth 1 containing: %for.cond,%for.body,%if.then,%if.end
Inserted PHI: %changed.0.off01 = phi i1 [ false, %entry ], [ %changed.0.off0, %for.cond ]
Inserted PHI: %dec2 = phi i32 [ 1, %entry ], [ %dec, %for.cond ]
LoopRotation: into Loop at depth 1 containing: %for.body,%if.then,%if.end
Loop Unroll: F[test] Loop %for.body
Loop Size = 12
Trip Count = 2
Trip Multiple = 2
COMPLETELY UNROLLING loop %for.body with trip count 2!
Merging:
for.body.1: ; preds = %if.end
%or.1 = or i32 %a, %b
%idxprom.1 = sext i32 %dec to i64
%arrayidx.1 = getelementptr inbounds i32, i32 %c, i64 %idxprom.1
%1 = load i32, i32 %arrayidx.1, align 4
%cmp.1 = icmp eq i32 %or.1, %1
br i1 %cmp.1, label %if.end.1, label %if.then.1
into:
if.end: ; preds = %if.then, %for.body
%changed.1.off0 = phi i1 [ true, %if.then ], [ false, %for.body ]
%dec = add nsw i32 1, -1
%tobool = icmp eq i32 1, 0
br label %for.body.1
Merging:
for.cond.cleanup: ; preds = %if.end.1
%changed.0.off0.lcssa = phi i1 [ %changed.1.off0.1, %if.end.1 ]
ret i1 %changed.0.off0.lcssa
into:
if.end.1: ; preds = %if.then.1, %if.end
%changed.1.off0.1 = phi i1 [ true, %if.then.1 ], [ %changed.1.off0, %if.end ]
%dec.1 = add nsw i32 %dec, -1
%tobool.1 = icmp eq i32 %dec, 0
br label %for.cond.cleanup