LLVM register allocator issues

llvmbot commented 18 years ago


Bugzilla Link	776
Resolution	INVALID
Resolved on	Feb 18, 2008 00:48
Version	trunk
OS	All
Attachments	Test case 1, Test case 3
Reporter	LLVM Bugzilla Contributor
CC	@isanbard,@sunfishcode

Extended Description

There are a number of register allocator issues that we will need to deal with eventually.

Example 1:

cond_next98 (0x890d850, LLVM BB @0x8908ff0): ADJCALLSTACKDOWN 4 MOV32mi %ESP, 1, %NOREG, 0, 0 CALLpcrel32 ADJCALLSTACKUP 4, 0 %reg1027 = MOV32rr %EAX %reg1028 = MOV32rr %EDX %reg1029 = MOV32rm %NOREG, 1, %NOREG, 0 %reg1036 = MOVZX32rm8 %reg1029, 1, %NOREG, 12 %reg1037 = MOV32rm %NOREG, 1, %NOREG, %reg1038 = MOV8ri 252 %reg1039 = ADD8rm %reg1038, %reg1037, 4, %reg1036, 0 CMP8ri %reg1039, 6 JB mbb<cond_next170,0x890d970> Successors according to CFG: 0x890d910 0x890d970

LBB1_8: #cond_next98 movl $0, (%esp) call L_int_cst_value$stub movl 0, %ecx movl %ecx, 72(%esp) movzbl 12(%ecx), %ecx movl %ecx, 84(%esp) movl L_tree_code_type$non_lazy_ptr, %ecx movl %ecx, 88(%esp) movb $252, 83(%esp) movb 83(%esp), %bl movl 84(%esp), %edi addb (%ecx,%edi,4), %bl movb %bl, 83(%esp) cmpb $6, %bl jb LBB1_10 #cond_next170

Obviously rematerialization will fix eliminate the first movb $252, 83(%esp). The second can be fixed with LR spliting.

We should have been smarter about picking the right registers for reg1036, 1037. Their live ranges intercepts with R8, so we should have picked from the registers that don't conflicts with them.

Example 2:

long %test(long %x, short %y) { entry: %tmp = cast short %y to ubyte ; [#uses=1] %tmp1 = shr long %x, ubyte %tmp ; [#uses=1] ret long %tmp1 }

_test: subl $8, %esp movl %esi, 4(%esp) movl %ebx, (%esp) movl 12(%esp), %eax movb 20(%esp), %bl movl 16(%esp), %esi movb %bl, %cl shrdl %cl, %esi, %eax movb %bl, %cl movl %esi, %edx sarl %cl, %edx sarl $31, %esi testb $32, %bl cmovne %edx, %eax cmovne %esi, %edx movl (%esp), %ebx movl 4(%esp), %esi addl $8, %esp ret

We are unable to coalesce result of the load "movb 20(%esp)" to cl. This is bug 687.

Example 3:

%tmp5 = load uint* %Bits        ; <uint> [#uses=1]
%tmp6 = cast uint %tmp4 to ubyte        ; <ubyte> [#uses=2]
%tmp7 = shl uint %tmp5, ubyte %tmp6     ; <uint> [#uses=1]
%tmp9 = xor ubyte %tmp6, 16 

movl %esi, %ecx
movl L_Bits$non_lazy_ptr, %edi
movl (%edi), %ebx
movl %ebx, 12(%esp)
# TRUNCATE 
movb %cl, %bl
movb %bl, %cl
shll %cl, 12(%esp)
xorb $16, %bl

Truncate (as well as anyext) should be treated specially by the register allocator. So its live range can conflict with the source live range.

Example 4:

float foo(int x, float y, unsigned c) { float res = 0.0; unsigned i; for (i = 0; i < c; i++) { float xx = (float)x[i]; xx = xx * y[i]; xx += res; res = xx; } return res; }

LBB_foo_3: # no_exit cvtsi2ss %XMM0, DWORD PTR [%EDX + 4%ESI] mulss %XMM0, DWORD PTR [%EAX + 4%ESI] addss %XMM0, %XMM1 inc %ESI cmp %ESI, %ECX **** movaps %XMM1, %XMM0 jb LBB_foo_3 # no_exit

We need to teach the coalescer to commute 2-addr instructions, allowing us to eliminate the reg-reg copy in this example:

There is also bug 770. We need live range splitting or be smarter about when to join a live range with another that is targetting a narrower register class.

lattner commented 16 years ago

Evan, please file relevant subtasks of this bug as their own separate bugs. Thanks!

llvmbot commented 18 years ago

Thanks. Looks like an interesting paper. I'll definitely read it.

llvmbot commented 18 years ago

Maybe implement or use ides to improve the current allocator from this paper:

www.usenix.org/events/vee05/full_papers/p132-wimmer.pdf

llvmbot commented 18 years ago

Another example. See test/Regression/CodeGen/X86/2006-05-11-InstrSched.ll

_foo: subl $24, %esp movl %esi, 12(%esp) movl %edi, 8(%esp) movl %ebx, 4(%esp) movl %ebp, (%esp) movl 52(%esp), %eax movl %eax, 20(%esp) <==== BAD movl 56(%esp), %eax movl %eax, 16(%esp) <==== BAD movl 44(%esp), %eax movl 72(%esp), %ecx movl 48(%esp), %edx movl 28(%esp), %esi cmpl $5, %ecx jl LBB1_4 #return LBB1_1: #cond_true.preheader movl $5, %edi xorl %ebx, %ebx LBB1_2: #cond_true movdqu (%eax,%ebx), %xmm0 movl 20(%esp), %ebp movdqu (%ebp,%ebx), %xmm1 paddd (%edx,%ebx), %xmm0 movl 16(%esp), %ebp paddd (%ebp,%ebx), %xmm1 movaps %xmm1, %xmm2 pcmpgtd %xmm0, %xmm2 movaps %xmm2, %xmm3 pandn %xmm0, %xmm3 andps %xmm1, %xmm2 orps %xmm3, %xmm2 movdqa %xmm2, 4(%esi,%ebx) addl $16, %ebx addl $4, %edi cmpl %ecx, %edi jle LBB1_2 #cond_true LBB1_3: #return.loopexit LBB1_4: #return movl (%esp), %ebp movl 4(%esp), %ebx movl 8(%esp), %edi movl 12(%esp), %esi addl $24, %esp ret

Obviously we should never spill something that's loaded from the stack frame. This is another opportunity for rematerialization.

llvm / llvm-project

LLVM register allocator issues #1148

Extended Description