llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.4k stars 11.74k forks source link

[InstCombine] Missed optimization : fold `X > C2 ? X + C1 : C2 + C1` to `max(X, C2) + C1` #82414

Open XChy opened 7 months ago

XChy commented 7 months ago

Alive2 proof: https://alive2.llvm.org/ce/z/ERjNs4

Motivating example

define i32 @src(i32 %x) {
  %add = add nsw i32 %x, 16
  %cmp = icmp sgt i32 %x, 1008
  %s = select i1 %cmp, i32 %add, i32 1024
  ret i32 %s
}

can be folded to:

define i32 @tgt(i32 %x) {
  %smax = call i32 @llvm.smax.i32(i32 %x, i32 1008)
  %add = add nuw nsw i32 %smax, 16
  ret i32 %add
}

LLVM does well when C1 or C2 is not constant, but when both are constants, LLVM missed it. Though this example doesn't show better codegen, I think it's a better canonicalization.

Real-world motivation

This snippet of IR is derived from protobuf/generated_message_tctable_lite.cc after O3 pipeline (original IR is from llvm-opt-benchmark). Original IR is too big to attach here, email me to get it please.

Let me know if you can confirm that it's an optimization opportunity, thanks.

nikic commented 7 months ago

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

XChy commented 7 months ago

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

No, just as this example shows. I don't insist on it since it doesn't bring better codegen..

XChy commented 7 months ago

Detected a similar but more complex pattern just now: https://alive2.llvm.org/ce/z/pTzsqM This one produces better optimizations in the surrounding context, see also: https://godbolt.org/z/9d5n7o1er

pinskia commented 7 months ago

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

On RISCV, it will allow to use the smax instruction even for the original simplified example instead of the more complex code.

dtcxzyw commented 7 months ago

Does this produce better optimizations in the surrounding context? I'm not sure this transform is worthwhile.

On RISCV, it will allow to use the smax instruction even for the original simplified example instead of the more complex code.

It depends on the materialization cost for the immediates.

XChy commented 6 months ago

Detect better optimization in the surrounding context, mainly when the select is used multiple times. An example: https://alive2.llvm.org/ce/z/7Hb43S , which eliminates an add instruction.

nikic commented 6 months ago

I believe https://github.com/rust-lang/rust/issues/123845 is another motivating case for this. That one is smin with multi-use condition and one-use add.