llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.6k stars 11.82k forks source link

Sibling loops confuse spill placement #24117

Open jmolloy opened 9 years ago

jmolloy commented 9 years ago
Bugzilla Link 23743
Version trunk
OS All
Attachments Testcase
CC @Arnaud-de-Grandmaison-ARM,@jmolloy,@MatzeB,@qcolombet,@sanjoy

Extended Description

In the attached testcase there is one loop with two child loops. The two child loops are siblings to each other.

A reloaded is inserted in the second sibling loop, instead of a different register being spilled outside the second loop.

The register being spilled is %vreg7:

selectOrSplit rGPR:%vreg45 [576r,656B:0)[656B,976r:1)[976r,1104B:2) 0@576r 1@656B-phi 2@976r w=2.185477e+05 %R10 is available at cost 1 Only trying the first 10 regs. should evict: %vreg23 [160r,1184B:0) 0@160r w= 4.682460e+04 should evict: %vreg1 [80r,1184B:0) 0@80r w= 2.164171e+04 should evict: %vreg8 [560r,1104B:0) 0@560r w= 2.003648e+04 should evict: %vreg7 [496r,1104B:0) 0@496r w= 1.771947e+04 evicting %R6 interference: Cascade 4 unassigning %vreg7 from %R6: R6 assigning %vreg45 to %R6: R6 [576r,656B:0)[656B,976r:1)[976r,1104B:2) 0@576r 1@656B-phi 2@976r queuing new interval: %vreg7 [496r,1104B:0) 0@496r

%vreg7 has the lowest spill cost because it is within an if/else inside the second loop. However, spilling %vreg23 or %vreg1 would not be as costly as %vreg7 because both of those registers are live-through the second loop (they are only used in the first loop).

It appears the spill placement algorithm is aware of loop nest depth (implied by block frequency) but unaware of loop hierarchy.

Compile with: llc -O3 test.ll and note:

.LBB0_5: @ %bb21 @ Parent Loop BB0_2 Depth=1 @ => This Inner Loop Header: Depth=2 sub.w r8, r7, #​2 cmp r8, r4 bhi .LBB0_7 @ BB#6: @ %bb27 @ in Loop: Header=BB0_5 Depth=2 ldrb.w r5, [r9, r7] lsl.w r2, lr, r11 eors r2, r5 ldr r5, [sp, #​8] @ 4-byte Reload NAUGHTY and.w lr, r2, r5

qcolombet commented 9 years ago

I have to double check but my hopes are low that we can do something reasonable. Basically we are hitting a fundamental limitation of the algorithm. It looks at the current state for deciding and cannot think future (which is pretty hard).

I believe we could tweak the eviction policy to “fake” the splitting mechanism and check how expensive that would be to spill the resulting live-range (which should be shorter, thus likely less expensive). However, assuming this is doable (faking the splitting may be fun), like all heuristics, if we get it wrong, e.g., because the split live-ranges would have different priority and we completely missed what the future will in fact look like, we can have some dramatic regressions.

jmolloy commented 9 years ago

Hi Quentin,

Indeed, the live range of vreg23/vreg1 should be split, I think. How do you think we should approach this? the spill in the inner loop is avoidable (with perfect allocation) and is obviously bad, so is there anything we can do about it?

I'm not massively well versed in the myriad tradeoffs in register allocation/range-splitting/spill-placement/coalescing.

James

qcolombet commented 9 years ago

Hi James,

I am not sure what you are suggesting here.

When we take the decision of evicting something for vreg45, we have to assume the worst case scenario, i.e., the evicted register will be spilled around all uses. In that respect, both vreg23 and verg1 are more expensive I guess (I haven’t actually checked). The fact that evicting one of those is cheaper may come from the fact that unlike vreg7, we will be able to split them and have a cheaper live-range to spill as a result. Anyhow, we do not know that when taking the decision to evict and I do not see how we can determine that at that time.

The bottom line is I do not see how the loop hierarchy would help here.

Thanks.