Open jmolloy opened 9 years ago
I have to double check but my hopes are low that we can do something reasonable. Basically we are hitting a fundamental limitation of the algorithm. It looks at the current state for deciding and cannot think future (which is pretty hard).
I believe we could tweak the eviction policy to “fake” the splitting mechanism and check how expensive that would be to spill the resulting live-range (which should be shorter, thus likely less expensive). However, assuming this is doable (faking the splitting may be fun), like all heuristics, if we get it wrong, e.g., because the split live-ranges would have different priority and we completely missed what the future will in fact look like, we can have some dramatic regressions.
Hi Quentin,
Indeed, the live range of vreg23/vreg1 should be split, I think. How do you think we should approach this? the spill in the inner loop is avoidable (with perfect allocation) and is obviously bad, so is there anything we can do about it?
I'm not massively well versed in the myriad tradeoffs in register allocation/range-splitting/spill-placement/coalescing.
James
Hi James,
I am not sure what you are suggesting here.
When we take the decision of evicting something for vreg45, we have to assume the worst case scenario, i.e., the evicted register will be spilled around all uses. In that respect, both vreg23 and verg1 are more expensive I guess (I haven’t actually checked). The fact that evicting one of those is cheaper may come from the fact that unlike vreg7, we will be able to split them and have a cheaper live-range to spill as a result. Anyhow, we do not know that when taking the decision to evict and I do not see how we can determine that at that time.
The bottom line is I do not see how the loop hierarchy would help here.
Thanks.
Extended Description
In the attached testcase there is one loop with two child loops. The two child loops are siblings to each other.
A reloaded is inserted in the second sibling loop, instead of a different register being spilled outside the second loop.
The register being spilled is %vreg7:
selectOrSplit rGPR:%vreg45 [576r,656B:0)[656B,976r:1)[976r,1104B:2) 0@576r 1@656B-phi 2@976r w=2.185477e+05 %R10 is available at cost 1 Only trying the first 10 regs. should evict: %vreg23 [160r,1184B:0) 0@160r w= 4.682460e+04 should evict: %vreg1 [80r,1184B:0) 0@80r w= 2.164171e+04 should evict: %vreg8 [560r,1104B:0) 0@560r w= 2.003648e+04 should evict: %vreg7 [496r,1104B:0) 0@496r w= 1.771947e+04 evicting %R6 interference: Cascade 4 unassigning %vreg7 from %R6: R6 assigning %vreg45 to %R6: R6 [576r,656B:0)[656B,976r:1)[976r,1104B:2) 0@576r 1@656B-phi 2@976r queuing new interval: %vreg7 [496r,1104B:0) 0@496r
%vreg7 has the lowest spill cost because it is within an if/else inside the second loop. However, spilling %vreg23 or %vreg1 would not be as costly as %vreg7 because both of those registers are live-through the second loop (they are only used in the first loop).
It appears the spill placement algorithm is aware of loop nest depth (implied by block frequency) but unaware of loop hierarchy.
Compile with: llc -O3 test.ll and note:
.LBB0_5: @ %bb21 @ Parent Loop BB0_2 Depth=1 @ => This Inner Loop Header: Depth=2 sub.w r8, r7, #2 cmp r8, r4 bhi .LBB0_7 @ BB#6: @ %bb27 @ in Loop: Header=BB0_5 Depth=2 ldrb.w r5, [r9, r7] lsl.w r2, lr, r11 eors r2, r5 ldr r5, [sp, #8] @ 4-byte Reload NAUGHTY and.w lr, r2, r5