llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.01k stars 11.96k forks source link

verify-scev fails with "Trip Count for Loop ... Changed!" #89365

Open bjope opened 6 months ago

bjope commented 6 months ago

Given this input

define void @foo() {
entry:
  br label %for.outer

for.outer.again:
  %phi3.lcssa = phi i16 [ %phi3, %for.cond3 ]
  br label %for.outer

for.outer:
  %.pre = phi i16 [ 0, %entry ], [ %phi3.lcssa, %for.outer.again ]
  %step.outer = add i16 %.pre, 2
  %cmp0 = icmp ult i16 %step.outer, 17
  br i1 %cmp0, label %for.cond1, label %end

for.cond1:
  %phi1 = phi i16 [ %step.1, %for.body1 ], [ %step.outer, %for.outer ]
  %step.1 = add i16 %phi1, 1
  %cmp1 = icmp ne i16 %step.1, 8
  br i1 %cmp1, label %for.body1, label %for.cond2.preheader

for.body1:
  br label %for.cond1

for.cond2.preheader:
  %phi1.lcssa = phi i16 [ %phi1, %for.cond1 ]
  %cmpnew = icmp uge i16 %phi1.lcssa, 8
  br i1 %cmpnew, label %for.cond2, label %end

for.cond2:
  %phi2 = phi i16 [ %step.2, %for.body2 ], [ %phi1.lcssa, %for.cond2.preheader ]
  %step.2 = add i16 %phi2, -1
  %cmp2 = icmp ugt i16 %phi2, 8
  br i1 %cmp2, label %for.body2, label %for.cond3.preheader

for.body2:
  br label %for.cond2

for.cond3.preheader:
  %step.2.lcssa = phi i16 [ %step.2, %for.cond2 ]
  br label %for.cond3

for.cond3:
  %phi3 = phi i16 [ %step.3, %for.body3 ], [ %step.2.lcssa, %for.cond3.preheader ]
  %step.3 = add i16 %phi3, 1
  %cmp3 = icmp ne i16 %phi3, 20
  br i1 %cmp3, label %for.body3, label %for.outer.again

for.body3:
  br label %for.cond3

end:
  ret void
}

when running opt -passes="print<scalar-evolution>,no-op-loop,print<scalar-evolution>,verify<scalar-evolution>" -o /dev/null the first print says:

Classifying expressions for: @foo
  %phi3.lcssa = phi i16 [ %phi3, %for.cond3 ]
  -->  {{{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2>,+,1}<nw><%for.cond3> U: full-set S: full-set  -->  20 U: [20,21) S: [20,21)                Exits: 20               LoopDispositions: { %for.outer: Variant, %for.cond1: Variant, %for.cond2: Variant, %for.cond3: Computable }
  %.pre = phi i16 [ 0, %entry ], [ %phi3.lcssa, %for.outer.again ]
  -->  %.pre U: full-set S: full-set            Exits: <<Unknown>>              LoopDispositions: { %for.outer: Variant, %for.cond1: Invariant, %for.cond2: Invariant, %for.cond3: Invariant }
  %step.outer = add i16 %.pre, 2
  -->  (2 + %.pre) U: full-set S: full-set              Exits: <<Unknown>>              LoopDispositions: { %for.outer: Variant, %for.cond1: Invariant, %for.cond2: Invariant, %for.cond3: Invariant }
  %phi1 = phi i16 [ %step.1, %for.body1 ], [ %step.outer, %for.outer ]
  -->  {(2 + %.pre),+,1}<%for.cond1> U: full-set S: full-set            Exits: 7                LoopDispositions: { %for.cond1: Computable, %for.outer: Variant }
  %step.1 = add i16 %phi1, 1
  -->  {(3 + %.pre),+,1}<%for.cond1> U: full-set S: full-set            Exits: 8                LoopDispositions: { %for.cond1: Computable, %for.outer: Variant }
  %phi1.lcssa = phi i16 [ %phi1, %for.cond1 ]
  -->  {(2 + %.pre),+,1}<%for.cond1> U: full-set S: full-set  -->  7 U: [7,8) S: [7,8)          Exits: 7                LoopDispositions: { %for.outer: Variant, %for.cond1: Computable, %for.cond2: Invariant, %for.cond3: Invariant }
  %phi2 = phi i16 [ %step.2, %for.body2 ], [ %phi1.lcssa, %for.cond2.preheader ]
  -->  {{(2 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2> U: full-set S: full-set  -->  {7,+,-1}<nw><%for.cond2> U: [7,8) S: [7,8)            Exits: 7                LoopDispositions: { %for.cond2: Computable, %for.outer: Variant }
  %step.2 = add i16 %phi2, -1
  -->  {{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2> U: full-set S: full-set  -->  {6,+,-1}<nw><%for.cond2> U: [6,7) S: [6,7)            Exits: 6                LoopDispositions: { %for.cond2: Computable, %for.outer: Variant }
  %step.2.lcssa = phi i16 [ %step.2, %for.cond2 ]
  -->  {{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2> U: full-set S: full-set  -->  6 U: [6,7) S: [6,7)           Exits: 6                LoopDispositions: { %for.outer: Variant, %for.cond1: Variant, %for.cond2: Computable, %for.cond3: Invariant }
  %phi3 = phi i16 [ %step.3, %for.body3 ], [ %step.2.lcssa, %for.cond3.preheader ]
  -->  {{{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2>,+,1}<nw><%for.cond3> U: full-set S: full-set  -->  {6,+,1}<nw><%for.cond3> U: [6,21) S: [6,21)             Exits: 20               LoopDispositions: { %for.cond3: Computable, %for.outer: Variant }
  %step.3 = add i16 %phi3, 1
  -->  {{{(2 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2>,+,1}<nw><%for.cond3> U: full-set S: full-set  -->  {7,+,1}<nw><%for.cond3> U: [7,22) S: [7,22)             Exits: 21               LoopDispositions: { %for.cond3: Computable, %for.outer: Variant }
Determining loop execution counts for: @foo
Loop %for.cond1: backedge-taken count is (5 + (-1 * %.pre))
Loop %for.cond1: constant max backedge-taken count is i16 -1
Loop %for.cond1: symbolic max backedge-taken count is (5 + (-1 * %.pre))
Loop %for.cond1: Trip multiple is 1
Loop %for.cond2: backedge-taken count is i16 0
Loop %for.cond2: constant max backedge-taken count is i16 0
Loop %for.cond2: symbolic max backedge-taken count is i16 0
Loop %for.cond2: Trip multiple is 1
Loop %for.cond3: backedge-taken count is i16 14
Loop %for.cond3: constant max backedge-taken count is i16 14
Loop %for.cond3: symbolic max backedge-taken count is i16 14
Loop %for.cond3: Trip multiple is 15
Loop %for.outer: <multiple exits> Unpredictable backedge-taken count.
  exit count for for.outer: ***COULDNOTCOMPUTE***
  exit count for for.cond2.preheader: i1 false
Loop %for.outer: constant max backedge-taken count is i1 false
Loop %for.outer: symbolic max backedge-taken count is i1 false
  symbolic max exit count for for.outer: ***COULDNOTCOMPUTE***
  symbolic max exit count for for.cond2.preheader: i1 false

and then the second one says:

Classifying expressions for: @foo
  %phi3.lcssa = phi i16 [ %phi3, %for.cond3 ]
  -->  {{{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2>,+,1}<nw><%for.cond3> U: full-set S: full-set  -->  19 U: [19,20) S: [19,20)                Exits: 19               LoopDispositions: { %for.outer: Variant, %for.cond1: Variant, %for.cond2: Variant, %for.cond3: Computable }
  %.pre = phi i16 [ 0, %entry ], [ %phi3.lcssa, %for.outer.again ]
  -->  %.pre U: full-set S: full-set            Exits: <<Unknown>>              LoopDispositions: { %for.outer: Variant, %for.cond1: Invariant, %for.cond2: Invariant, %for.cond3: Invariant }
  %step.outer = add i16 %.pre, 2
  -->  (2 + %.pre) U: full-set S: full-set              Exits: <<Unknown>>              LoopDispositions: { %for.outer: Variant, %for.cond1: Invariant, %for.cond2: Invariant, %for.cond3: Invariant }
  %phi1 = phi i16 [ %step.1, %for.body1 ], [ %step.outer, %for.cond1.preheader ]
  -->  {(2 + %.pre),+,1}<%for.cond1> U: full-set S: full-set            Exits: 7                LoopDispositions: { %for.cond1: Computable, %for.outer: Variant }
  %step.1 = add i16 %phi1, 1
  -->  {(3 + %.pre),+,1}<%for.cond1> U: full-set S: full-set            Exits: 8                LoopDispositions: { %for.cond1: Computable, %for.outer: Variant }
  %phi1.lcssa = phi i16 [ %phi1, %for.cond1 ]
  -->  {(2 + %.pre),+,1}<%for.cond1> U: full-set S: full-set  -->  7 U: [7,8) S: [7,8)          Exits: 7                LoopDispositions: { %for.outer: Variant, %for.cond1: Computable, %for.cond2: Invariant, %for.cond3: Invariant }
  %phi2 = phi i16 [ %step.2, %for.body2 ], [ %phi1.lcssa, %for.cond2.preheader1 ]
  -->  {{(2 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2> U: full-set S: full-set  -->  {7,+,-1}<nw><%for.cond2> U: [7,8) S: [7,8)            Exits: 7                LoopDispositions: { %for.cond2: Computable, %for.outer: Variant }
  %step.2 = add i16 %phi2, -1
  -->  {{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2> U: full-set S: full-set  -->  {6,+,-1}<nw><%for.cond2> U: [6,7) S: [6,7)            Exits: 6                LoopDispositions: { %for.cond2: Computable, %for.outer: Variant }
  %step.2.lcssa = phi i16 [ %step.2, %for.cond2 ]
  -->  {{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2> U: full-set S: full-set  -->  6 U: [6,7) S: [6,7)           Exits: 6                LoopDispositions: { %for.outer: Variant, %for.cond1: Variant, %for.cond2: Computable, %for.cond3: Invariant }
  %phi3 = phi i16 [ %step.3, %for.body3 ], [ %step.2.lcssa, %for.cond3.preheader ]
  -->  {{{(1 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2>,+,1}<nw><%for.cond3> U: full-set S: full-set  -->  {6,+,1}<nw><%for.cond3> U: [6,20) S: [6,20)             Exits: 19               LoopDispositions: { %for.cond3: Computable, %for.outer: Variant }
  %step.3 = add i16 %phi3, 1
  -->  {{{(2 + %.pre),+,1}<%for.cond1>,+,-1}<nw><%for.cond2>,+,1}<nw><%for.cond3> U: full-set S: full-set  -->  {7,+,1}<nw><%for.cond3> U: [7,21) S: [7,21)             Exits: 20               LoopDispositions: { %for.cond3: Computable, %for.outer: Variant }
Determining loop execution counts for: @foo
Loop %for.cond1: backedge-taken count is (5 + (-1 * %.pre))
Loop %for.cond1: constant max backedge-taken count is i16 -1
Loop %for.cond1: symbolic max backedge-taken count is (5 + (-1 * %.pre))
Loop %for.cond1: Trip multiple is 1
Loop %for.cond2: backedge-taken count is i16 0
Loop %for.cond2: constant max backedge-taken count is i16 0
Loop %for.cond2: symbolic max backedge-taken count is i16 0
Loop %for.cond2: Trip multiple is 1
Loop %for.cond3: backedge-taken count is i16 13
Loop %for.cond3: constant max backedge-taken count is i16 13
Loop %for.cond3: symbolic max backedge-taken count is i16 13
Loop %for.cond3: Trip multiple is 14
Loop %for.outer: <multiple exits> Unpredictable backedge-taken count.
  exit count for for.outer: ***COULDNOTCOMPUTE***
  exit count for for.cond2.preheader: i1 false
Loop %for.outer: constant max backedge-taken count is i1 false
Loop %for.outer: symbolic max backedge-taken count is i1 false
  symbolic max exit count for for.outer: ***COULDNOTCOMPUTE***
  symbolic max exit count for for.cond2.preheader: i1 false

So we can for example see that Exsits has changed for %phi3 and %step.3, and the max backedge-taken counts for %for.cond3 goes from 14 to 13.

Then the verification complains:

Trip Count for Loop at depth 2 containing: %for.cond3<header><exiting>,%for.body3<latch>
 Changed!
Old: 13
New: 14
Delta: -1

Some things that I've noticed: 1) no-op-loop will result in running LoopSimplify, which will add a pre-header for the for.cond1 loop. This should not really change anything from the perspective of SCEV, but I figure we do forget some SCEV values/dispositions in the process.

2) If modifying LoopSimplify like this

@@ -881,6 +929,17 @@ PreservedAnalyses LoopSimplifyPass::run(Function &F,
     MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);
   }

+  SE->print(dbgs());
+  SE->forgetAllLoops();
+  SE->print(dbgs());
+  SE->verify();

   // Note that we don't preserve LCSSA in the new PM, if you need it run LCSSA
   // after simplifying the loops. MemorySSA is preserved if it exists.
  for (auto *L : *LI)
    Changed |=
        simplifyLoop(L, DT, LI, SE, AC, MSSAU.get(), /*PreserveLCSSA*/ false);

we can see that doing forgetAllLoops() and then re-calculating SCEV gives the same behavior. So this is not really related to LoopSimplify or the pass manager. But it seems a bit weird that the SCEV result changes after doing forgetAllLoops!?!?

3) When doing the second print<scalar-evolution> the function ScalarEvolution::howManyGreaterThans is involved. But it isn't involved the first time.

4) ( @fhahn : Bi-secting pointed at this starting to happen after commit 7019624ee124 "[SCEV] Strengthen nowrap flags via ranges for ARs on construction.". But maybe that is a co-incidence. )

efriedma-quic commented 6 months ago

I suspect the issue here is nowrap flags on one of the AddRecs: either we're setting incorrect flags on an AddRec, or the presence of flags is confusing the trip count computation.

It looks like forgetAllLoops is slightly broken; it doesn't properly throw away old AddRecs.

bjope commented 6 months ago

I suspect the issue here is nowrap flags on one of the AddRecs: either we're setting incorrect flags on an AddRec, or the presence of flags is confusing the trip count computation.

It looks like forgetAllLoops is slightly broken; it doesn't properly throw away old AddRecs.

ScalarEvolution::forgetAllLoops() is doing HasRecMap.clear(); so that map is cleared.

I did however find some other data structure that I'm not sure if/when they are being cleared: LoopPropertiesCache, SCEVUsers, LoopUsers, UnsignedWrapViaInductionTried, SignedWrapViaInductionTried

I think that we should clear UnsignedWrapViaInductionTried, SignedWrapViaInductionTried in forgetAllLoops(), given that we do not use forgetMemoizedResults when doing forgetAllLoops(). I think that those maps would be cleared if doing forgetLoop on one loop at a time.

Not sure about the rest. I can't for example see that LoopUsers is cleared anywhere. So only way to get things removed from that map is to invalidate ScalarEvolution.

v01dXYZ commented 5 months ago

Hi, I'm hitting on the same issue during IndVarSimplify with origin/main. It seems when a SCEVConstant is put in a ExitLimit.ExactNotTaken it won't be invalidated by SE->forgetValue(UseInst) in SimplifyIndVar. computeBackedgeTakenCount ignores SCEVConstant purposely.

I can get something that "works" by removing the if(!isa<ConstantSCEV>) and adding a registerUser(EL.ExactNotTaken, getSCEV(ExitCond)) in computeExitLimitFromICmp. Of course it's too hacky to be relevant.

v01dXYZ commented 5 months ago

BTW the following test case fails:

opt -S -passes='loop(indvars)' -verify-scev llvm/test/Transforms/IndVarSimplify/pr55689.ll