Open tesuji opened 4 months ago
It seems using return i <= N
instead of break
in the loop body "solves" the missing optimisation.
Nevertheless, I wonder which pass could be responsible of that kind of optimisation IndVarSimplify
or ConstraintElimination
or other ?
I looked at ConstraintElimination
and it fails to analyse the following loop pattern:
[ while.cond ] --> [ while.body ]
| |
| v
+---------------> [ while.end ]
but if we duplicate [while.end]
into [ while.end.cond ]
and [ while.end.body ]
which make now [ while.cond ]
to be a dominator of [ while.end.cond ]
(and same for [ while.body ]
/[ while.end.body ]
) we get the following IR before ConstraintElimination
:
define dso_local i1 @eat_digits(ptr nocapture noundef readonly %s) local_unnamed_addr #0 {
entry:
br label %while.cond
while.cond: ; preds = %while.body, %entry
%i.0 = phi i64 [ 0, %entry ], [ %add, %while.body ]
%exitcond.not = icmp eq i64 %i.0, 1234
br i1 %exitcond.not, label %common.ret, label %while.body
while.body: ; preds = %while.cond
%arrayidx = getelementptr inbounds i8, ptr %s, i64 %i.0
%0 = load i8, ptr %arrayidx, align 1
%1 = add i8 %0, -48
%or.cond = icmp ult i8 %1, 10
%add = add nuw nsw i64 %i.0, 1
br i1 %or.cond, label %while.cond, label %while.end.2
common.ret: ; preds = %while.cond, %while.end.2
%common.ret.op = phi i1 [ %cmp7, %while.end.2 ], [ true, %while.cond ]
ret i1 %common.ret.op
while.end.2: ; preds = %while.body
%cmp7 = icmp ult i64 %i.0, 1235
br label %common.ret
}
ConstraintElimination
is able to infer that %cmp7
is true
now. I suspect the following code is responsible of missing to propagate the condition information for this kind of loop pattern:
@fhahn Do you think this bug is related to ConstraintElimination
?
The faulty case manually fed to opt
:
define dso_local i1 @eat_digits(ptr nocapture noundef readonly %s) local_unnamed_addr #0 {
entry:
br label %while.cond
while.cond: ; preds = %while.body, %entry
%i.0 = phi i64 [ 0, %entry ], [ %add, %while.body ]
%exitcond.not = icmp eq i64 %i.0, 1234
br i1 %exitcond.not, label %while.end, label %while.body
while.body: ; preds = %while.cond
%arrayidx = getelementptr inbounds i8, ptr %s, i64 %i.0
%0 = load i8, ptr %arrayidx, align 1
%1 = add i8 %0, -48
%or.cond = icmp ult i8 %1, 10
%add = add nuw nsw i64 %i.0, 1
; %add = add i64 %i.0, 1
br i1 %or.cond, label %while.cond, label %while.end
while.end: ; preds = %while.body, %while.cond
%cmp6 = icmp ult i64 %i.0, 1235
ret i1 %cmp6
}
$ opt faulty_manual.ll -passes="print<scalar-evolution>"
Printing analysis 'Scalar Evolution Analysis' for function 'eat_digits':
Classifying expressions for: @eat_digits
%i.0 = phi i64 [ 0, %entry ], [ %add, %while.body ]
--> {0,+,1}<nuw><nsw><%while.cond> U: [0,1235) S: [0,1235) Exits: <<Unknown>> LoopDispositions: { %while.cond: Computable }
%arrayidx = getelementptr inbounds i8, ptr %s, i64 %i.0
--> {%s,+,1}<nw><%while.cond> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %while.cond: Computable }
%0 = load i8, ptr %arrayidx, align 1
--> %0 U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %while.cond: Variant }
%1 = add i8 %0, -48
--> (-48 + %0) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %while.cond: Variant }
%add = add nuw nsw i64 %i.0, 1
--> {1,+,1}<nuw><nsw><%while.cond> U: [1,1236) S: [1,1236) Exits: <<Unknown>> LoopDispositions: { %while.cond: Computable }
Determining loop execution counts for: @eat_digits
Loop %while.cond: <multiple exits> Unpredictable backedge-taken count.
exit count for while.cond: i64 1234
exit count for while.body: ***COULDNOTCOMPUTE***
Loop %while.cond: constant max backedge-taken count is i64 1234
Loop %while.cond: symbolic max backedge-taken count is i64 1234
symbolic max exit count for while.cond: i64 1234
symbolic max exit count for while.body: ***COULDNOTCOMPUTE***
ScalarEvolution
get everything we need, but it is not used.
I was able to solve your problems for the case you presented: a loop with a constant bound. The case of a dynamic bound is not yet supported. GCC seems to optimise only the constant case too.
This old issue doesn't seem to be fixed as all: https://bugs.llvm.org/show_bug.cgi?id=49389 Bug 49389 would be reposted below (godbolt link: https://godbolt.org/z/hdza9Pb8z):
compiles to:
GCC is able to compile it to:
This is originally from: https://github.com/rust-lang/rust/issues/81432