llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.71k stars 10.94k forks source link

[LSR] Bad code generation from LSR #36202

Open llvmbot opened 6 years ago

llvmbot commented 6 years ago
Bugzilla Link 36854
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor
CC @atrick,@qcolombet,@rotateright

Extended Description

For the C code below, the transformation done by LSR cause bad code generation on AArch64.

int isTrue();

void test(int M, int P, int P2, int P3, int P4) { int t = 0; for (int k = 1; k <= M; ++k) { P[k] = P2[k-1] + P[k-1]; P3[k] = P2[k-1] + P3[k-1]; if (isTrue()) { P4[k] = P[k] + P2[k]; } } }

Current code generation -----------------------------------------

.LBB0_2: ldr w8, [x19] ldur w9, [x20, #-4] add w8, w9, w8 str w8, [x20] ldr w8, [x19] ldur w9, [x21, #-4] add w8, w9, w8 str w8, [x21] bl isTrue cbz w0, .LBB0_4 // %bb.3: // %if.then ldr w8, [x20] ldr w9, [x19, #​4] add w8, w9, w8 str w8, [x23] .LBB0_4: // %for.inc // in Loop: Header=BB0_2 Depth=1 subs x22, x22, #​1 // =1 add x20, x20, #​4 // =4 add x21, x21, #​4 // =4 add x23, x23, #​4 // =4 add x19, x19, #​4 // =4 b.ne .LBB0_2

Code when skipping LSR --------------------------------------------------

.LBB0_2: // %for.body // =>This Inner Loop Header: Depth=1 lsl x25, x24, #​2 sub x8, x25, #​4 // =4 ldr w9, [x21, x8] ldr w10, [x22, x8] add w9, w10, w9 str w9, [x22, x25] ldr w9, [x21, x8] ldr w8, [x20, x8] add w8, w8, w9 str w8, [x20, x25] bl isTrue cbz w0, .LBB0_4 // %bb.3: // %if.then // in Loop: Header=BB0_2 Depth=1 ldr w8, [x22, x25] ldr w9, [x21, x25] add w8, w9, w8 str w8, [x19, x25] .LBB0_4: // %for.inc // in Loop: Header=BB0_2 Depth=1 add x24, x24, #​1 // =1 cmp x24, x23 b.ne .LBB0_2

llvmbot commented 6 years ago

All the pointers are accessed through either [k-1] or [k], so I tried to manually pick the formulae respecting this access pattern like :

reg(%P) + 1reg({0,+,4}<%for.body>) reg(%P) + 1reg({4,+,4}<%for.body>) reg(%P2) + 1reg({0,+,4}<%for.body>) reg(%P2) + 1reg({4,+,4}<%for.body>) reg(%P3) + 1reg({0,+,4}<%for.body>) reg(%P3) + 1reg({4,+,4}<%for.body>) reg(%P4) + 1reg({4,+,4}<%for.body>) reg(%P5) + 1reg({4,+,4}<%for.body>) reg(%P6) + 1reg({4,+,4}<%for.body>) reg(%P7) + 1reg({4,+,4}<%for.body>)

then we can remove the extra ADDs : .LBB0_2: // %for.body ldr w8, [x24, x27] ldr w9, [x25, x27] add w8, w9, w8 str w8, [x25, x19] ldr w8, [x24, x27] ldr w9, [x23, x27] add w8, w9, w8 str w8, [x23, x19] bl isTrue cbz w0, .LBB0_4 // %bb.3: // %if.then ldr w8, [x25, x19] ldr w9, [x24, x19] add w8, w9, w8 str w8, [x22, x19] str w8, [x21, x19] ldr w8, [x22, x19] ldr x9, [sp, #​8] // 8-byte Folded Reload str w8, [x20, x19] ldr w8, [x22, x19] str w8, [x9, x19] .LBB0_4: // %for.inc // in Loop: Header=BB0_2 Depth=1 add x26, x26, #​1 // =1 add x27, x27, #​4 // =4 cmp x28, x26 add x19, x19, #​4 // =4 b.ne .LBB0_2

Note that in order to keep all the formulae I want to pick manually, I skipped narrowing search space by skipping NarrowSearchSpaceUsingHeuristics().

llvmbot commented 6 years ago

We can see even worse generation with small change from the first example :

int isTrue();

void test(int M, int P, int P2, int P3, int P4,int P5, int P6, int *P7) { int t = 0; for (int k = 1; k <= M; ++k) { P[k] = P2[k-1] + P[k-1]; P3[k] = P2[k-1] + P3[k-1]; if (isTrue()) { P4[k] = P[k] + P2[k]; P5[k] = P4[k]; P6[k] = P4[k]; P7[k] = P4[k]; } } }

Current code generation -------------------------------

.LBB0_2: // %for.body ldr w8, [x19] ldur w9, [x20, #-4] add w8, w9, w8 str w8, [x20] ldr w8, [x19] ldur w9, [x21, #-4] add w8, w9, w8 str w8, [x21] bl isTrue cbz w0, .LBB0_4 // %bb.3: // %if.then ldr w8, [x20] ldr w9, [x19, #​4] add w8, w9, w8 str w8, [x26] str w8, [x24] ldr w8, [x26] str w8, [x23] ldr w8, [x26] str w8, [x22] .LBB0_4: // %for.inc // in Loop: Header=BB0_2 Depth=1 subs x25, x25, #​1 // =1 add x20, x20, #​4 // =4 add x21, x21, #​4 // =4 add x22, x22, #​4 // =4 add x23, x23, #​4 // =4 add x24, x24, #​4 // =4 add x26, x26, #​4 // =4 add x19, x19, #​4 // =4 b.ne .LBB0_2