llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.78k stars 10.97k forks source link

Regression in removing pointer range loops initializing dead variables. #29357

Open llvmbot opened 7 years ago

llvmbot commented 7 years ago
Bugzilla Link 28987
Version 3.9
OS Linux
Reporter LLVM Bugzilla Contributor
CC @majnemer,@joker-eph,@sanjoy

Extended Description

The following function is optimized out in clang 3.8, but not 3.9:

void test() { double a[4]; for(double p = a, end = a + 4; p < end; p++) *p = 0.0; }

The optimized IR on 3.9 is more or less:

define void @​test() { entry-block: %a = alloca [4 x double], align 8 %a6 = ptrtoint [4 x double] %a to i64 %0 = bitcast [4 x double] %a to i8 call void @​llvm.lifetime.start(i64 32, i8 %0) %scevgep = getelementptr [4 x double], [4 x double] %a, i64 1, i64 0 %scevgep1 = ptrtoint double %scevgep to i64 %scevgep2 = getelementptr [4 x double], [4 x double] %a, i64 0, i64 1 %scevgep23 = ptrtoint double %scevgep2 to i64 %1 = icmp ugt i64 %scevgep1, %scevgep23 %umax = select i1 %1, i64 %scevgep1, i64 %scevgep23 %umax45 = inttoptr i64 %umax to i8 %2 = xor i64 %a6, -1 %uglygep = getelementptr i8, i8 %umax45, i64 %2 %uglygep7 = ptrtoint i8 %uglygep to i64 %3 = add i64 %uglygep7, 8 %4 = and i64 %3, -8 call void @​llvm.memset.p0i8.i64(i8 %0, i8 0, i64 %4, i32 8, i1 false) call void @​llvm.lifetime.end(i64 32, i8* %0) ret void }

If the llvm.lifetime.{start,end} calls are removed, running opt again will reduce this to just "ret void".

Any other uses of %a will also prevent the memset and everything before it to be removed, even if the entire array is statically overwritten.

llvmbot commented 7 years ago

Third time's the charm: the difference isn't in phis, it's the comparison operator, the following example exhibits the "loops remain after -O3" problem (by using < instead of !=):

include

using namespace std;

typedef long long i64;

i64 add_to_mat2d_elt(i64 x) { // Ugly initialization with all 0. array<array<i64, 2>, 2> ab; { array<i64, 2> tmp;

for(auto *p = &tmp[0]; p < &tmp[2]; p++)
  *p = 0;

for(auto *p = &ab[0]; p < &ab[2]; p++)
  *p = tmp;

}

return ab[0][0] + x; }

llvmbot commented 7 years ago

Unoptimized IR for add_to_mat2d_elt. My bad, I forgot that the unoptimized IR my case already had phis for the loops, so the comparison is not exactly fair.

I am attaching the unoptimized IR for the testcase that should optimize the same as the following C++ program:

include

using namespace std;

typedef long long i64;

i64 add_to_mat2d_elt(i64 x) { // Ugly initialization with all 0. array<array<i64, 2>, 2> ab; { array<i64, 2> tmp;

for(auto *p = &tmp[0]; p != &tmp[2]; p++)
  *p = 0;

for(auto *p = &ab[0]; p != &ab[2]; p++)
  *p = tmp;

}

return ab[0][0] + x; }

joker-eph commented 7 years ago

Here is what I see with a current clang at O1/O2/O3 for this example:

%"struct.std::1::array" = type { [2 x %"struct.std::1::array.0"] } %"struct.std::__1::array.0" = type { [2 x i32] }

; Function Attrs: ssp uwtable define void @​_Z3foov() local_unnamed_addr #​0 { entry: %ab = alloca %"struct.std::1::array", align 4 %0 = bitcast %"struct.std::1::array" %ab to i8 call void @​llvm.lifetime.start(i64 16, i8 %0) #​3 call void @​llvm.memset.p0i8.i64(i8 %0, i8 0, i64 16, i32 4, i1 false) call void @​_Z3barRNSt3__15arrayINS0_IiLm2EEELm2EEE(%"struct.std::__1::array" nonnull dereferenceable(16) %ab) call void @​llvm.lifetime.end(i64 16, i8 %0) #​3 ret void }

llvmbot commented 7 years ago

I found an even worse case, with unoptimized branches on constants (i1 true) and a whole barely touched loop. I believe (but I couldn't verify yet) that this C++ snippet will reproduce:

include

using namespace std;

void bar(array<array<int, 2>, 2>&);

void foo() { array<array<int, 2>, 2> ab; for(auto p = &ab[0]; p != &ab[2]; p++) for(auto q = &(p)[0]; q != &(p)[2]; q++) *q = 0; bar(ab); }

sanjoy commented 7 years ago

My first inclination is to say that the SCEV fix itself is correct, but some parts of LLVM (including SCEV) needs to get smarter to cope with r273079. I'll take a closer look tonight.

991901f3-cc14-4404-b340-165691b62a58 commented 7 years ago

bisection points to: [SCEV] Fix incorrect trip count computation

The way we elide max expressions when computing trip counts is incorrect
-- it breaks cases like this:

```
static int wrapping_add(int a, int b) {
  return (int)((unsigned)a + (unsigned)b);
}

void test() {
  volatile int end_buf = 2147483548; // INT_MIN - 100
  int end = end_buf;

  unsigned counter = 0;
  for (int start = wrapping_add(end,  200); start < end; start++)
    counter++;

  print(counter);
}
```

Sanjoy, can you please take a look?