Open aykevl opened 1 month ago
oh wow, this is the first time I've ever seen such a fully-minimized case.
Rust has been encountering this for some time now but I just thought it was due to trying to compile the entirety of libcore with debuginfo and no optimizations.
@benshi001 FYI. Not sure why you didn't get tagged by a bot.
Also CC @Patryk27
Thanks for info, I'll take a look.
Simplified:
define void @foo() {
bb0:
%1 = call ptr @bar()
%2 = getelementptr ptr, ptr %1, i32 1
store ptr %2, ptr %1
ret void
}
declare ptr @bar()
It seems that regalloc has some troubles doing stack spills here (?) - that's how the code looks like just before fast regalloc kicks in:
Allocating bb.0.bb0:
ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
RCALLk @bar, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r10r9 $r11r10 $r12r11 $r13r12 $r14r13 $r15r14 $r16r15 $r17r16 $r29r28>, implicit $sp, implicit $r1, implicit-def $sp, implicit-def $r25r24
ADJCALLSTACKUP 0, 0, implicit-def dead $r31r30, implicit $sp
%0:ptrdispregs = COPY $r25r24
%1:ptrdispregs = COPY %0:ptrdispregs
%1:ptrdispregs = ADIWRdK %1:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
STWPtrRr %0:ptrdispregs, killed %1:ptrdispregs :: (store (s16) into %ir.0, align 1)
RET implicit $r1
... and the issue seems to be that regalloc needs to allocate two ptrdispregs regs, but it's got only one available:
>> STWPtrRr %0:ptrdispregs, killed %1:ptrdispregs :: (store (s16) into %ir.0, align 1)
Regs:
Search register for %0 in class PTRDISPREGS with hint $noreg
AllocationOrder(DREGS) = [ ... ]
AllocationOrder(PTRDISPREGS) = [ $r31r30 ] (sub-class)
Register: $r31r30 Cost: 0 BestCost: 4294967295
Assigning %0 to $r31r30
Search register for %1 in class PTRDISPREGS with hint $noreg
Register: $r31r30 already used in instr.
error: ran out of registers during register allocation
What's curious is that llc
both produces the error and then outputs what looks like the correct assembly.
It seems that TwoAddressInstructionPass is somehow involved here - before the pass, the code requires just one register of class ptrdispregs:
%1:iwregs = ADIWRdK %0:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
STWPtrRr %0:ptrdispregs, killed %1:iwregs :: (store (s16) into %ir.0, align 1)
... but that pass changes class of %1
from iwregs
into ptrdispregs
:
********** REWRITING TWO-ADDR INSTRS **********
********** Function: foo
%1:iwregs = ADIWRdK %0:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
prepend: %1:iwregs = COPY %0:ptrdispregs
rewrite to: %1:ptrdispregs = ADIWRdK %1:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
... yielding:
%1:ptrdispregs = COPY %0:ptrdispregs
%1:ptrdispregs = ADIWRdK %1:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
STWPtrRr %0:ptrdispregs, killed %1:ptrdispregs :: (store (s16) into %ir.0, align 1)
... that can't get regalloced, because there's just one register of the ptrdispregs
class available at that particular place.
Edit: nope, this causes the crash as well, without TwoAddressInstructionPass changing anything:
define void @foo() {
bb0:
%1 = call ptr @bar()
%2 = call ptr @bar()
store ptr %2, ptr %1
store ptr %1, ptr %2
ret void
}
declare ptr @bar()
Alright, summing up - minimal reproducer:
define void @foo() {
bb0:
%1 = call ptr @bar()
%2 = call ptr @bar()
store ptr %2, ptr %1
store ptr %1, ptr %2
ret void
}
declare ptr @bar()
Because store ptr %x
requires for %x
to be of class ptrdispregs
, this causes LLVM to generate:
RCALLk @bar, ... implicit-def $r25r24
%0:ptrdispregs = COPY $r25r24
RCALLk @bar, ..., implicit-def $r25r24
%1:ptrdispregs = COPY $r25r24
STWPtrRr %0:ptrdispregs, %1:ptrdispregs
STWPtrRr %1:ptrdispregs, %0:ptrdispregs
... which cannot be solved by regalloc itself, since there's just one register of class ptrdispregs
available at a time.
Usually this isn't a problem, because most of the time the second argument is of another class, e.g.:
define void @foo() {
bb0:
%1 = call ptr @bar()
%2 = call ptr @bar()
store ptr %2, ptr %1
ret void
}
... yields:
RCALLk @bar, ..., implicit-def $r25r24
%0:ptrdispregs = COPY $r25r24
RCALLk @bar, ..., implicit-def $r25r24
%1:dregs = COPY $r25r24
STWPtrRr %0:ptrdispregs, %1:dregs
I'm not sure on the correct approach here - maybe we should have another pass that goes over stores that have two ptrdispregs, transforming:
STWPtrRr %0:ptrdispregs, %1:ptrdispregs
... into:
%2:dregs = COPY %1:ptrdispregs
STWPtrRr %0:ptrdispregs, %2:dregs
... but I'm not 100% sold - maybe LLVM has already something like this built-in?
Edit: amazingly enough, LLVM does seem to have something related, but working other way around -- commenting our this loop:
... leaves extra COPY-ies around and makes regalloc succeed; hmm.
Thank you for investigating!
I have the following IR, reduced from a much bigger test case (which would presumably also crash in a real program):
Compiling it, I get the following error:
This happens with both LLVM 18.1.6 and with a recent LLVM build (4377656f2419a8eb18c01e86929b689dcf22b5d6).