llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.4k stars 11.74k forks source link

[InstCombine] Missed optimization: fold `((select C1, C2) | A) ^ C3` to `(select C1^C3, C2^C3) | A)` #79266

Open XChy opened 8 months ago

XChy commented 8 months ago

Alive2 proof: https://alive2.llvm.org/ce/z/oEmX9u

Motivating example

define i32 @src(i32 %a, i1 %c) {
entry:
  %s = select i1 %c, i32 0, i32 4
  %shl = shl i32 %a, 4
  %or = or disjoint i32 %s, %shl
  %xor = xor i32 %or, 4
  ret i32 %xor
}

can be folded to

define i32 @tgt(i32 %a, i1 %c) {
entry:
  %s = select i1 %c, i32 4, i32 0
  %shl = shl i32 %a, 4
  %or = or disjoint i32 %s, %shl
  ret i32 %or
}

Since or disjoint here can be regarded as xor, ((select C1, C2) | A) ^ C3 can be (select C1, C2) ^ C3 ^ A) and then we get (select C1^C3, C2^C3) | A).

Real-world motivation

This snippet of IR is derived from zstd/lib/compress/zstd_compress_literals.c@ZSTD_compressLiterals (after O3 pipeline). The example above is a reduced version. If you're interested in the original suboptimal IR and optimal IR, contact me to get them, please.

Let me know if you can confirm that it's an optimization opportunity, thanks.

AmrDeveloper commented 1 month ago

I will work on it

dtcxzyw commented 1 month ago

@AmrDeveloper Please read https://llvm.org/docs/InstCombineContributorGuide.html before submitting a patch :)