llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.26k stars 11.67k forks source link

[AVR] Ran out of registers during register allocation #104032

Open aykevl opened 1 month ago

aykevl commented 1 month ago

I have the following IR, reduced from a much bigger test case (which would presumably also crash in a real program):

target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr"

declare ptr @bar() addrspace(1)

declare void @baz() addrspace(1)

define void @foo(ptr %arg) addrspace(1) {
entry:
  %val = call addrspace(1) ptr @bar()
  %gep1 = getelementptr { i8, i8, ptr }, ptr %val, i32 0, i32 1
  %gep2 = getelementptr { i8, i8, ptr }, ptr %val, i32 0, i32 2
  %gep3 = getelementptr { i8, i8, ptr }, ptr %arg, i32 0, i32 2
  call addrspace(1) void @baz()
  store i8 0, ptr %gep1, align 1
  store ptr %gep2, ptr %arg, align 1
  store ptr null, ptr %gep3, align 1
  ret void
}

Compiling it, I get the following error:

$ llc -filetype=obj -o reduced.o reduced.ll
error: ran out of registers during register allocation

This happens with both LLVM 18.1.6 and with a recent LLVM build (4377656f2419a8eb18c01e86929b689dcf22b5d6).

workingjubilee commented 1 month ago

oh wow, this is the first time I've ever seen such a fully-minimized case.

Rust has been encountering this for some time now but I just thought it was due to trying to compile the entirety of libcore with debuginfo and no optimizations.

aykevl commented 1 month ago

@benshi001 FYI. Not sure why you didn't get tagged by a bot.

Also CC @Patryk27

Patryk27 commented 1 month ago

Thanks for info, I'll take a look.

Patryk27 commented 1 month ago

Simplified:

define void @foo() {
bb0:
  %1 = call ptr @bar()
  %2 = getelementptr ptr, ptr %1, i32 1
  store ptr %2, ptr %1
  ret void
}

declare ptr @bar()

It seems that regalloc has some troubles doing stack spills here (?) - that's how the code looks like just before fast regalloc kicks in:

Allocating bb.0.bb0:
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  RCALLk @bar, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r10r9 $r11r10 $r12r11 $r13r12 $r14r13 $r15r14 $r16r15 $r17r16 $r29r28>, implicit $sp, implicit $r1, implicit-def $sp, implicit-def $r25r24
  ADJCALLSTACKUP 0, 0, implicit-def dead $r31r30, implicit $sp
  %0:ptrdispregs = COPY $r25r24
  %1:ptrdispregs = COPY %0:ptrdispregs
  %1:ptrdispregs = ADIWRdK %1:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
  STWPtrRr %0:ptrdispregs, killed %1:ptrdispregs :: (store (s16) into %ir.0, align 1)
  RET implicit $r1

... and the issue seems to be that regalloc needs to allocate two ptrdispregs regs, but it's got only one available:

>> STWPtrRr %0:ptrdispregs, killed %1:ptrdispregs :: (store (s16) into %ir.0, align 1)
Regs:
Search register for %0 in class PTRDISPREGS with hint $noreg
AllocationOrder(DREGS) = [ ...  ]
AllocationOrder(PTRDISPREGS) = [ $r31r30 ] (sub-class)
    Register: $r31r30 Cost: 0 BestCost: 4294967295

Assigning %0 to $r31r30
Search register for %1 in class PTRDISPREGS with hint $noreg
    Register: $r31r30 already used in instr.

error: ran out of registers during register allocation

What's curious is that llc both produces the error and then outputs what looks like the correct assembly.

Patryk27 commented 1 month ago

It seems that TwoAddressInstructionPass is somehow involved here - before the pass, the code requires just one register of class ptrdispregs:

  %1:iwregs = ADIWRdK %0:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
  STWPtrRr %0:ptrdispregs, killed %1:iwregs :: (store (s16) into %ir.0, align 1)

... but that pass changes class of %1 from iwregs into ptrdispregs:

********** REWRITING TWO-ADDR INSTRS **********
********** Function: foo
    %1:iwregs = ADIWRdK %0:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
        prepend:    %1:iwregs = COPY %0:ptrdispregs
        rewrite to: %1:ptrdispregs = ADIWRdK %1:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg

... yielding:

  %1:ptrdispregs = COPY %0:ptrdispregs
  %1:ptrdispregs = ADIWRdK %1:ptrdispregs(tied-def 0), 2, implicit-def dead $sreg
  STWPtrRr %0:ptrdispregs, killed %1:ptrdispregs :: (store (s16) into %ir.0, align 1)

... that can't get regalloced, because there's just one register of the ptrdispregs class available at that particular place.


Edit: nope, this causes the crash as well, without TwoAddressInstructionPass changing anything:

define void @foo() {
bb0:
  %1 = call ptr @bar()
  %2 = call ptr @bar()
  store ptr %2, ptr %1
  store ptr %1, ptr %2
  ret void
}

declare ptr @bar()
Patryk27 commented 1 month ago

Alright, summing up - minimal reproducer:

define void @foo() {
bb0:
  %1 = call ptr @bar()
  %2 = call ptr @bar()
  store ptr %2, ptr %1
  store ptr %1, ptr %2
  ret void
}

declare ptr @bar()

Because store ptr %x requires for %x to be of class ptrdispregs, this causes LLVM to generate:

  RCALLk @bar, ... implicit-def $r25r24
  %0:ptrdispregs = COPY $r25r24

  RCALLk @bar, ..., implicit-def $r25r24
  %1:ptrdispregs = COPY $r25r24

  STWPtrRr %0:ptrdispregs, %1:ptrdispregs
  STWPtrRr %1:ptrdispregs, %0:ptrdispregs

... which cannot be solved by regalloc itself, since there's just one register of class ptrdispregs available at a time.

Usually this isn't a problem, because most of the time the second argument is of another class, e.g.:

define void @foo() {
bb0:
  %1 = call ptr @bar()
  %2 = call ptr @bar()
  store ptr %2, ptr %1
  ret void
}

... yields:

  RCALLk @bar, ..., implicit-def $r25r24
  %0:ptrdispregs = COPY $r25r24

  RCALLk @bar, ..., implicit-def $r25r24
  %1:dregs = COPY $r25r24

  STWPtrRr %0:ptrdispregs, %1:dregs

I'm not sure on the correct approach here - maybe we should have another pass that goes over stores that have two ptrdispregs, transforming:

  STWPtrRr %0:ptrdispregs, %1:ptrdispregs

... into:

  %2:dregs = COPY %1:ptrdispregs
  STWPtrRr %0:ptrdispregs, %2:dregs

... but I'm not 100% sold - maybe LLVM has already something like this built-in?


Edit: amazingly enough, LLVM does seem to have something related, but working other way around -- commenting our this loop:

https://github.com/llvm/llvm-project/blob/8ac924744e93258d0c490e2fa2d4518e24cb458d/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp#L109

... leaves extra COPY-ies around and makes regalloc succeed; hmm.

aykevl commented 1 month ago

Thank you for investigating!