[InstCombine] Missed optimization: fail to fold `(A >> C1) Pred C2` if `shr` is used multiple times

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

29.25k stars 12.07k forks source link

[InstCombine] Missed optimization: fail to fold `(A >> C1) Pred C2` if `shr` is used multiple times #83430

Open XChy opened 8 months ago

XChy commented 8 months ago

Alive2 proof: https://alive2.llvm.org/ce/z/4JKW9d

Motivating example

define i1 @src(i32 %a){
entry:
  %div = ashr exact i32 %a, 3
  call void @use(i32 %div)
  %cmp = icmp slt i32 %div, 15
  ret i1 %cmp
}

can be folded to:

define i1 @tgt(i32 %a) {
  %div = ashr exact i32 %a, 3
  call void @use(i32 %div)
  %cmp = icmp slt i32 %a, 120
  ret i1 %cmp
}

This enables more optimizations for the other user of shr as I observe in benchmark. Unsigned case is in alive2 proof.

Real-world motivation

Signed case is derived from z3/src/sat/smt/pb_solver.cpp (after O3 pipeline). Unsigned case is derived from z3/src/tactic/core/collect_occs.cpp (after O3 pipeline). The example above is a reduced version. If you're interested in the original suboptimal IR and optimal IR, email me please.

Let me know if you can confirm that it's an optimization opportunity, thanks.

llvmbot commented 8 months ago

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

In the comments of the issue, request for it to be assigned to you.
Fix the issue locally.
Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
Create a Git commit.
Run git clang-format HEAD~1 to format your changes.
Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

llvmbot commented 8 months ago

@llvm/issue-subscribers-good-first-issue

Author: XChy (XChy)

Alive2 proof: https://alive2.llvm.org/ce/z/4JKW9d ### Motivating example ```llvm define i1 @src(i32 %a){ entry: %div = ashr exact i32 %a, 3 call void @use(i32 %div) %cmp = icmp slt i32 %div, 15 ret i1 %cmp } ``` can be folded to: ```llvm define i1 @tgt(i32 %a) { %div = ashr exact i32 %a, 3 call void @use(i32 %div) %cmp = icmp slt i32 %a, 120 ret i1 %cmp } ``` This enable more optimizations for the other user of `shr` as I observe in benchmark. Unsigned case is in alive2 proof. ### Real-world motivation Signed case is derived from [z3/src/sat/smt/pb_solver.cpp](https://github.com/Z3Prover/z3/blob/master/src/sat/smt/pb_solver.cpp) (after O3 pipeline). Unsigned case is derived from [z3/src/tactic/core/collect_occs.cpp](https://github.com/Z3Prover/z3/blob/master/src/tactic/core/collect_occs.cpp) (after O3 pipeline). The example above is a reduced version. If you're interested in the original suboptimal IR and optimal IR, email me please. **Let me know if you can confirm that it's an optimization opportunity, thanks.**

SahilPatidar commented 8 months ago

@dtcxzyw, I would like to take this one.

SahilPatidar commented 8 months ago

@dtcxzyw I have been reviewing the code and found this line: if (IsAShr && Shr->hasOneUse()) in the function foldICmpShrConstant. I believe this is where the issue occurs when a shift operation has only one use. However, I want clarification on whether simply removing this condition will be sufficient, or if other cases need to be considered.

dtcxzyw commented 8 months ago

@dtcxzyw I have been reviewing the code and found this line: if (IsAShr && Shr->hasOneUse()) in the function foldICmpShrConstant. I believe this is where the issue occurs when a shift operation has only one use. However, I want clarification on whether simply removing this condition will be sufficient, or if other cases need to be considered.

Please refer to the prior commit adding the one-use check: https://github.com/llvm/llvm-project/commit/a66051c68a43af39f9fd962f71d58ae0efcf860d. You should be careful not to break min/max idioms.

Feel free to file a PR. I cannot offer any suggestion until I see some performance data from my pre-commit CI.

XChy commented 8 months ago

I think it's ok to file a PR to simply remove that condition and see how it break other patterns in @dtcxzyw's CI.