llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.92k stars 11.52k forks source link

Negative effects of using shifts for arithmetic optimization #64736

Open ZY546 opened 1 year ago

ZY546 commented 1 year ago

Hello, I noticed that the optimization of some expressions is inhibited by using shift optimization.

Let's look at a simple example: Example 1: https://godbolt.org/z/Wfno6jKEv

int var_13; int var_14;

void test(int var_4, int var_5) {
    var_13 = var_4 + var_4 + var_5;
    var_14 = var_4 + var_5;
}

Clang16 -O3:

define dso_local void @test(int, int)(i32 noundef %var_4, i32 noundef %var_5) local_unnamed_addr {
entry:
  %add = shl nsw i32 %var_4, 1
  %add1 = add nsw i32 %add, %var_5
  store i32 %add1, ptr @var_13, align 4
  %add2 = add nsw i32 %var_5, %var_4
  store i32 %add2, ptr @var_14, align 4
  ret void
}

In this simple example, if var_4 + var_5 were optimized as an available expression, all computations would be optimized as two additions instead of the two additions and one shift operations shown in the result.

Here is a similar example, but different: in this case, var4+var_4 is not supposed to be evaluated first, but is affected by ReassociatePass. Example 2: https://godbolt.org/z/8YTevzqbT

Let's look at an example where the negative impact is most obvious: Example 3: https://godbolt.org/z/crxTxPd6r

extern int var_19;
extern int var_20;
extern int var_23;
void test(int var_1, int var_4, int var_6, int var_10, int var_12) {
    var_19 = var_1 + var_4 + var_10 + var_12;
    var_20 = var_10 + var_12;
    var_23 = var_4 + var_10 + var_6 + var_4;
}

Clang16 -O3:

define dso_local void @test(int, int, int, int, int)(i32 noundef %var_1, i32 noundef %var_4, i32 noundef %var_6, i32 noundef %var_10, i32 noundef %var_12) local_unnamed_addr {
entry:
  %add = add i32 %var_10, %var_4
  %add1 = add i32 %add, %var_1
  %add2 = add i32 %add1, %var_12
  store i32 %add2, ptr @var_19, align 4
  %add3 = add nsw i32 %var_12, %var_10
  store i32 %add3, ptr @var_20, align 4
  %factor = shl i32 %var_4, 1
  %add5 = add i32 %factor, %var_6
  %add6 = add i32 %add5, %var_10
  store i32 %add6, ptr @var_23, align 4
  ret void
}

In this example, both var_4 + var_10 and var_10 + var_12 are available expressions. But because they have a common part var_10, they are not optimized as available expressions at the same time. From the optimized IR, it seems that the compiler wants to treat var_4 + var_10 as a available expression and forgo the optimization of var_10 + var_12. However, var_4 + var_10 is not optimized because of the shift operation. This leads to the end that both var_4 + var_10 and var_10 + var_12 are not optimized, which does not look like a good result.

XChy commented 11 months ago

Missed compilation of InstCombine/Reassociate pass. It seems to be a problem of canonicalization. Reduced Alive2