llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.25k stars 11.66k forks source link

InstCombine rule for ICmp+Add causes extra instructions #44244

Open annamthomas opened 4 years ago

annamthomas commented 4 years ago
Bugzilla Link 44899
Version trunk
OS All
CC @fhahn,@LebedevRI,@nikic,@rotateright,@TNorthover

Extended Description

Recently, there was a change (0f22e783a038b6983f0fe161eef6cf2add3a4156) landed in foldICmpAddConstant which was a revert of another change (checking for multiple uses of Add much earlier).

With this change we are having regressions on an internal benchmark.

I've narrowed it down to this IR snippet: define i1 @​test(i64 %0) {
%c1 = ashr i64 %0, 1 %c2 = add i64 %c1, 31083597720000 %c3 = icmp eq i64 %c2, 0 %c4 = icmp slt i64 %c2, 0 %c6 = select i1 %c4, i32 -1, i32 1 %result = select i1 %c3, i32 0, i32 %c6 %c5 = icmp sge i32 %result, 0 br i1 %c5, label %bci_28, label %unreached

bci_28: ; preds = %bci_0 ret i1 true

unreached: ret i1 false }

Running with instCombine produces: define i1 @​test(i64 %0) { %c1.mask = and i64 %0, -2 %c3 = icmp eq i64 %c1.mask, -62167195440000 %c4 = icmp sgt i64 %0, -62167195440001 %c5 = or i1 %c3, %c4 br i1 %c5, label %bci_28, label %unreached

bci_28: ; preds = %1 ret i1 true

unreached: ; preds = %1 ret i1 false }

We can clearly see from the original snippet, the whole sequence can be narrowed to: define i1 @​test(i64 %0) { %2 = icmp sgt i64 %0, -62167195440001 br i1 %2, label %bci_28, label %unreached

bci_28: ; preds = %1 ret i1 true

unreached: ; preds = %1 ret i1 false }

This is the case if I run instCombine with the change 0f22e783a038b6983f0fe161eef6cf2add3a4156 reverted.

I noticed that this was done to fix llvm/llvm-project#43445 . We will need a stronger check for Adds that have more than one use. This is a proper case of more instructions generated even if we break the dep chain between add and compare.

annamthomas commented 4 years ago

Ah yes, this pattern is more ugly than that the original one.

So, the attached testcase shows another regression: "aggregating" the compares is even more general than the previous testcase: %tmp = ashr i64 %arg, 1 %tmp1 = add nsw i64 %tmp, 31083597720000 %tmp2 = icmp eq i64 %tmp1, 0 %tmp3 = icmp sgt i64 %arg, -62167195440001 %tmp6 = or i1 %tmp2, %tmp3

LHS for first cmp is %tmp1 which requires going 2 instructions above to identify the relation w/ %arg.

I'm not convinced that these test cases can be fixed in instSimplify because the LHS for the compares being "aggregated" are different and can involve going multiple levels in the def chain for identifying the relation between the different LHS.

Note that the difference between instsimplify and instcombine is that instsimplify can't create new instructions, while instcombine can create new instructions as long as the total instruction count does not increase*

So it can be fixed in instsimplify: https://rise4fun.com/Alive/6kES ^ %tmp3 is the answer.

Just to be clear - I agree it can be fixed, it is just I don't think it should be fixed in InstSimplify... :)

The other question is how frequent the pattern is, and indeed, how complicated it is to match. In the new case, i agree putting it into instsimplify is too much.

But should still be fine for instcombine. We can keep having longer def chains that need traversal to identify the relation between the differing LHS. How do we identify the ranges for the LHS?

Coming back to the original problem - why not just restrict the type of uses allowed for the add?

LebedevRI commented 4 years ago

Ah yes, this pattern is more ugly than that the original one.

So, the attached testcase shows another regression: "aggregating" the compares is even more general than the previous testcase: %tmp = ashr i64 %arg, 1 %tmp1 = add nsw i64 %tmp, 31083597720000 %tmp2 = icmp eq i64 %tmp1, 0 %tmp3 = icmp sgt i64 %arg, -62167195440001 %tmp6 = or i1 %tmp2, %tmp3

LHS for first cmp is %tmp1 which requires going 2 instructions above to identify the relation w/ %arg.

I'm not convinced that these test cases can be fixed in instSimplify because the LHS for the compares being "aggregated" are different and can involve going multiple levels in the def chain for identifying the relation between the different LHS.

Note that the difference between instsimplify and instcombine is that instsimplify can't create new instructions, while instcombine can create new instructions as long as the total instruction count does not increase*

So it can be fixed in instsimplify: https://rise4fun.com/Alive/6kES ^ %tmp3 is the answer.

The other question is how frequent the pattern is, and indeed, how complicated it is to match. In the new case, i agree putting it into instsimplify is too much.

But should still be fine for instcombine.

annamthomas commented 4 years ago

So, the attached testcase shows another regression: "aggregating" the compares is even more general than the previous testcase: %tmp = ashr i64 %arg, 1 %tmp1 = add nsw i64 %tmp, 31083597720000 %tmp2 = icmp eq i64 %tmp1, 0 %tmp3 = icmp sgt i64 %arg, -62167195440001 %tmp6 = or i1 %tmp2, %tmp3

LHS for first cmp is %tmp1 which requires going 2 instructions above to identify the relation w/ %arg.

I'm not convinced that these test cases can be fixed in instSimplify because the LHS for the compares being "aggregated" are different and can involve going multiple levels in the def chain for identifying the relation between the different LHS.

LebedevRI commented 4 years ago

I took a closer look at the bug, I don't think the above suggestion will fix our use-case. It should fix the provided IR regression; if the snippet is over-reduced, then well..

Also, InstSimplify has checks for ICmp where one range is a superset of another. This applies only when the LHS is the same operand (see Analysis/InstructionSimplify.cpp: simplifyAndOrOfICmpsWithConstants). When the LHS operands are different, this should probably be handled by CVP. There is no technical reason why the original pattern in comment #​0 can't be handled by instsimplify. Whether it would fit best into instcombine or instsimplify i don't know currently.

It is just another case where it makes sense to have the single-use check early-on.

@​Roman - It looks like moving the hasOneUse check later in that rule only helps in breaking the dep chain between add and cmp (when there are more than one use of add). There is no reduction in number of instructions. Is that right? That does sound correct to me.

I'll add the snippet of code from our test case soon (the above testcase in the description I had is an oversimplified version, which is also not currently handled by opt -O3).

annamthomas commented 4 years ago

original testcase Run the attached orig.test.ll with opt -instcombine -S to see bad IR generated: define i32 @​wobble(i64 %arg) { bb: %tmp = ashr i64 %arg, 1 %tmp1 = add nsw i64 %tmp, 31083597720000 %tmp2 = icmp eq i64 %tmp1, 0 %tmp3 = icmp sgt i64 %arg, -62167195440001 %tmp6 = or i1 %tmp2, %tmp3 br i1 %tmp6, label %bb7, label %bb28 .... }

With the hasOneUse in InstCombiner::foldICmpAddConstant moved early on, the codegenerated has one cmp only: define i32 @​wobble(i64 %arg) { bb: %tmp = ashr i64 %arg, 1 %tmp1 = add nsw i64 %tmp, 31083597720000 %0 = icmp sgt i64 %tmp1, -1 br i1 %0, label %bb7, label %bb28 ... }

annamthomas commented 4 years ago

I took a closer look at the bug, I don't think the above suggestion will fix our use-case. Also, InstSimplify has checks for ICmp where one range is a superset of another. This applies only when the LHS is the same operand (see Analysis/InstructionSimplify.cpp: simplifyAndOrOfICmpsWithConstants). When the LHS operands are different, this should probably be handled by CVP.

It is just another case where it makes sense to have the single-use check early-on. @​Roman - It looks like moving the hasOneUse check later in that rule only helps in breaking the dep chain between add and cmp (when there are more than one use of add). There is no reduction in number of instructions. Is that right?

I'll add the snippet of code from our test case soon (the above testcase in the description I had is an oversimplified version, which is also not currently handled by opt -O3).

annamthomas commented 4 years ago

This seems like typical potential fallout of any instcombine change, some other pattern somewhere else is no longer folded as much :]

We'd need to generalize the missed fold, and simply add it:

Name: llvm/llvm-project#44244 %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 %t3 = icmp sgt i64 %x, -62167195440001 %r = or i1 %t2, %t3 => %r = icmp sgt i64 %x, -62167195440001

https://rise4fun.com/Alive/SXkb

It looks like ICMP EQ (and with positive constant), constant -> is folded to false by instCombine: %y = and i64 %x, 2 %z = icmp eq i64 %y, -62111 %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p br i1 %r, label %bci_28, label %unreached Yes, that would make sense, since the constant we are comparing with has some bits set that we just masked off.

What I meant was why isn't the same thing triggered for "and" with negative values (as-in the original test case). This should be IC using knownBits analysis?...

I checked my theory for "and with negative value" and IC does fold the cmp to false, when possible: %y = and i64 %x, -2 %z = icmp eq i64 %y, -62111 <-- folded to false. %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p

Although, it's not clear to me where in InstCombine the above transform happens. Confirmed that this is also done by InstSimplify through knownBits.

Taking the original test case (https://rise4fun.com/Alive/SXkb): %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 Why will %t2 simplify to false?

It won't/can't/shouldn't simplify that to false, because and i64 %x, -2 will result in %x with lowest bit being unset, and said lowest bit in -62167195440000 is not set, so that is not sufficient information to fold to false.

What is happening there is that it "aggregates" those two comparisons, and concludes that %t2 does not influence the result, so a single %t3 is sufficient. So this fold could actually go into instsimplify. oh yeah, the aggregation is how it decides. makes sense, thanks!

LebedevRI commented 4 years ago

This seems like typical potential fallout of any instcombine change, some other pattern somewhere else is no longer folded as much :]

We'd need to generalize the missed fold, and simply add it:

Name: llvm/llvm-project#44244 %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 %t3 = icmp sgt i64 %x, -62167195440001 %r = or i1 %t2, %t3 => %r = icmp sgt i64 %x, -62167195440001

https://rise4fun.com/Alive/SXkb

It looks like ICMP EQ (and with positive constant), constant -> is folded to false by instCombine: %y = and i64 %x, 2 %z = icmp eq i64 %y, -62111 %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p br i1 %r, label %bci_28, label %unreached Yes, that would make sense, since the constant we are comparing with has some bits set that we just masked off.

What I meant was why isn't the same thing triggered for "and" with negative values (as-in the original test case). This should be IC using knownBits analysis?...

I checked my theory for "and with negative value" and IC does fold the cmp to false, when possible: %y = and i64 %x, -2 %z = icmp eq i64 %y, -62111 <-- folded to false. %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p

Although, it's not clear to me where in InstCombine the above transform happens. Confirmed that this is also done by InstSimplify through knownBits.

Taking the original test case (https://rise4fun.com/Alive/SXkb): %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 Why will %t2 simplify to false?

It won't/can't/shouldn't simplify that to false, because and i64 %x, -2 will result in %x with lowest bit being unset, and said lowest bit in -62167195440000 is not set, so that is not sufficient information to fold to false.

What is happening there is that it "aggregates" those two comparisons, and concludes that %t2 does not influence the result, so a single %t3 is sufficient. So this fold could actually go into instsimplify.

annamthomas commented 4 years ago

This seems like typical potential fallout of any instcombine change, some other pattern somewhere else is no longer folded as much :]

We'd need to generalize the missed fold, and simply add it:

Name: llvm/llvm-project#44244 %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 %t3 = icmp sgt i64 %x, -62167195440001 %r = or i1 %t2, %t3 => %r = icmp sgt i64 %x, -62167195440001

https://rise4fun.com/Alive/SXkb

It looks like ICMP EQ (and with positive constant), constant -> is folded to false by instCombine: %y = and i64 %x, 2 %z = icmp eq i64 %y, -62111 %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p br i1 %r, label %bci_28, label %unreached Yes, that would make sense, since the constant we are comparing with has some bits set that we just masked off.

What I meant was why isn't the same thing triggered for "and" with negative values (as-in the original test case). This should be IC using knownBits analysis?...

I checked my theory for "and with negative value" and IC does fold the cmp to false, when possible: %y = and i64 %x, -2 %z = icmp eq i64 %y, -62111 <-- folded to false. %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p

Although, it's not clear to me where in InstCombine the above transform happens. Confirmed that this is also done by InstSimplify through knownBits.

Taking the original test case (https://rise4fun.com/Alive/SXkb): %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 Why will %t2 simplify to false?

LebedevRI commented 4 years ago

This seems like typical potential fallout of any instcombine change, some other pattern somewhere else is no longer folded as much :]

We'd need to generalize the missed fold, and simply add it:

Name: llvm/llvm-project#44244 %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 %t3 = icmp sgt i64 %x, -62167195440001 %r = or i1 %t2, %t3 => %r = icmp sgt i64 %x, -62167195440001

https://rise4fun.com/Alive/SXkb

It looks like ICMP EQ (and with positive constant), constant -> is folded to false by instCombine: %y = and i64 %x, 2 %z = icmp eq i64 %y, -62111 %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p br i1 %r, label %bci_28, label %unreached Yes, that would make sense, since the constant we are comparing with has some bits set that we just masked off.

I'm assuming the change to be done (for And with negative value) is in InstCombiner::foldICmpAndConstConst?

I'd think so, yes.

annamthomas commented 4 years ago

This seems like typical potential fallout of any instcombine change, some other pattern somewhere else is no longer folded as much :]

We'd need to generalize the missed fold, and simply add it:

Name: llvm/llvm-project#44244 %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 %t3 = icmp sgt i64 %x, -62167195440001 %r = or i1 %t2, %t3 => %r = icmp sgt i64 %x, -62167195440001

https://rise4fun.com/Alive/SXkb

It looks like ICMP EQ (and with positive constant), constant -> is folded to false by instCombine: %y = and i64 %x, 2 %z = icmp eq i64 %y, -62111 %p = icmp sgt i64 %x, 6 %r = or i1 %z, %p br i1 %r, label %bci_28, label %unreached

I'm assuming the change to be done (for And with negative value) is in InstCombiner::foldICmpAndConstConst?

LebedevRI commented 4 years ago

This seems like typical potential fallout of any instcombine change, some other pattern somewhere else is no longer folded as much :]

We'd need to generalize the missed fold, and simply add it:

Name: llvm/llvm-project#44244 %t1 = and i64 %x, -2 %t2 = icmp eq i64 %t1, -62167195440000 %t3 = icmp sgt i64 %x, -62167195440001 %r = or i1 %t2, %t3 => %r = icmp sgt i64 %x, -62167195440001

https://rise4fun.com/Alive/SXkb