llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.91k stars 11.52k forks source link

[InstCombine] fold multiply and equality compare with constant #51744

Open rotateright opened 2 years ago

rotateright commented 2 years ago
Bugzilla Link 52402
Version trunk
OS All
CC @anton-afanasyev,@RKSimon

Extended Description

Forking this off from bug 52289 - I don't know what the generalization is, but we're missing some kind of overflowing/shifting/multiplying magic:

define i1 @src(i32 %x) {
  %m = mul i32 %x, 1355350016 ; 0x50c90000
  %r = icmp eq i32 %m, 65536  ; 0x00010000
  ret i1 %r
}

define i1 @tgt(i32 %x) {
  %t = trunc i32 %x to i16
  %r = icmp eq i16 %t, 51577 ; 0xc979
  ret i1 %r
}

https://alive2.llvm.org/ce/z/7Ft6jA

anton-afanasyev commented 2 years ago

The generalization can be formalized in terms of modular arithmetic here (https://en.wikipedia.org/wiki/Modular_arithmetic). Optimizer is capable already of solving ordinary lineary equation (3x=9 <=> x=3) to get rid of multiplication, but can't do the same for modular linear equation (7x=5 (mod 16) <=> x=3 (mod 16) (https://gcc.godbolt.org/z/3WjKcxGhf).

The equation ax=b (mod c) can be solved (with certain constrains on constants) for any modulo, but c = 2^n case is of particular interest. We can try to solve this for patterns like icmp(and(mul(...))), using Euclidean algorithm (https://en.wikipedia.org/wiki/Euclidean_algorithm), though I'm not sure if this special case is worth it.

rotateright commented 2 years ago

The multiply fold might unlock this case, but there's another problem here that might be easier to solve - we don't thread the binop (shl in this case) back through the select-of-constants to see if that reduces: https://alive2.llvm.org/ce/z/zw2px6

This should be a separate bug, so filed as bug 52406.

rotateright commented 2 years ago

Here is the sample of similar nature, showing how we miss full folding at -O3:

define i1 @​src(i32 %ta) { %t3 = trunc i32 %ta to i8 %t4 = and i8 %t3, 8 %t6.not = icmp eq i8 %t4, 0 %t7 = select i1 %t6.not, i8 7, i8 0 %t8 = shl i8 %t4, %t7 %t10 = sext i8 %t8 to i32 %t12 = mul i32 %t10, 1355350016 %t13 = icmp eq i32 %t12, 65536 ret i1 %t13 }

define i1 @​tgt(i32 %ta) { ret i1 false }

https://alive2.llvm.org/ce/z/pbFLS2 https://gcc.godbolt.org/z/5ooM4f4rr

The multiply fold might unlock this case, but there's another problem here that might be easier to solve - we don't thread the binop (shl in this case) back through the select-of-constants to see if that reduces: https://alive2.llvm.org/ce/z/zw2px6

anton-afanasyev commented 2 years ago

Here is the sample of similar nature, showing how we miss full folding at -O3:

define i1 @​src(i32 %ta) { %t3 = trunc i32 %ta to i8 %t4 = and i8 %t3, 8 %t6.not = icmp eq i8 %t4, 0 %t7 = select i1 %t6.not, i8 7, i8 0 %t8 = shl i8 %t4, %t7 %t10 = sext i8 %t8 to i32 %t12 = mul i32 %t10, 1355350016 %t13 = icmp eq i32 %t12, 65536 ret i1 %t13 }

define i1 @​tgt(i32 %ta) { ret i1 false }

https://alive2.llvm.org/ce/z/pbFLS2 https://gcc.godbolt.org/z/5ooM4f4rr

rotateright commented 2 years ago

assigned to @anton-afanasyev