Closed llvmbot closed 16 years ago
Evan, please file relevant subtasks of this bug as their own separate bugs. Thanks!
Thanks. Looks like an interesting paper. I'll definitely read it.
Maybe implement or use ides to improve the current allocator from this paper:
www.usenix.org/events/vee05/full_papers/p132-wimmer.pdf
Another example. See test/Regression/CodeGen/X86/2006-05-11-InstrSched.ll
_foo: subl $24, %esp movl %esi, 12(%esp) movl %edi, 8(%esp) movl %ebx, 4(%esp) movl %ebp, (%esp) movl 52(%esp), %eax movl %eax, 20(%esp) <==== BAD movl 56(%esp), %eax movl %eax, 16(%esp) <==== BAD movl 44(%esp), %eax movl 72(%esp), %ecx movl 48(%esp), %edx movl 28(%esp), %esi cmpl $5, %ecx jl LBB1_4 #return LBB1_1: #cond_true.preheader movl $5, %edi xorl %ebx, %ebx LBB1_2: #cond_true movdqu (%eax,%ebx), %xmm0 movl 20(%esp), %ebp movdqu (%ebp,%ebx), %xmm1 paddd (%edx,%ebx), %xmm0 movl 16(%esp), %ebp paddd (%ebp,%ebx), %xmm1 movaps %xmm1, %xmm2 pcmpgtd %xmm0, %xmm2 movaps %xmm2, %xmm3 pandn %xmm0, %xmm3 andps %xmm1, %xmm2 orps %xmm3, %xmm2 movdqa %xmm2, 4(%esi,%ebx) addl $16, %ebx addl $4, %edi cmpl %ecx, %edi jle LBB1_2 #cond_true LBB1_3: #return.loopexit LBB1_4: #return movl (%esp), %ebp movl 4(%esp), %ebx movl 8(%esp), %edi movl 12(%esp), %esi addl $24, %esp ret
Obviously we should never spill something that's loaded from the stack frame. This is another opportunity for rematerialization.
Extended Description
There are a number of register allocator issues that we will need to deal with eventually.
Example 1:
cond_next98 (0x890d850, LLVM BB @0x8908ff0): ADJCALLSTACKDOWN 4 MOV32mi %ESP, 1, %NOREG, 0, 0 CALLpcrel32
ADJCALLSTACKUP 4, 0
%reg1027 = MOV32rr %EAX
%reg1028 = MOV32rr %EDX
%reg1029 = MOV32rm %NOREG, 1, %NOREG, 0
%reg1036 = MOVZX32rm8 %reg1029, 1, %NOREG, 12
%reg1037 = MOV32rm %NOREG, 1, %NOREG,
%reg1038 = MOV8ri 252
%reg1039 = ADD8rm %reg1038, %reg1037, 4, %reg1036, 0
CMP8ri %reg1039, 6
JB mbb<cond_next170,0x890d970>
Successors according to CFG: 0x890d910 0x890d970
LBB1_8: #cond_next98 movl $0, (%esp) call L_int_cst_value$stub movl 0, %ecx movl %ecx, 72(%esp) movzbl 12(%ecx), %ecx movl %ecx, 84(%esp) movl L_tree_code_type$non_lazy_ptr, %ecx movl %ecx, 88(%esp) movb $252, 83(%esp) movb 83(%esp), %bl movl 84(%esp), %edi addb (%ecx,%edi,4), %bl movb %bl, 83(%esp) cmpb $6, %bl jb LBB1_10 #cond_next170
Obviously rematerialization will fix eliminate the first movb $252, 83(%esp). The second can be fixed with LR spliting.
We should have been smarter about picking the right registers for reg1036, 1037. Their live ranges intercepts with R8, so we should have picked from the registers that don't conflicts with them.
Example 2:
long %test(long %x, short %y) { entry: %tmp = cast short %y to ubyte ; [#uses=1]
%tmp1 = shr long %x, ubyte %tmp ; [#uses=1]
ret long %tmp1
}
_test: subl $8, %esp movl %esi, 4(%esp) movl %ebx, (%esp) movl 12(%esp), %eax movb 20(%esp), %bl movl 16(%esp), %esi movb %bl, %cl shrdl %cl, %esi, %eax movb %bl, %cl movl %esi, %edx sarl %cl, %edx sarl $31, %esi testb $32, %bl cmovne %edx, %eax cmovne %esi, %edx movl (%esp), %ebx movl 4(%esp), %esi addl $8, %esp ret
We are unable to coalesce result of the load "movb 20(%esp)" to cl. This is bug 687.
Example 3:
Truncate (as well as anyext) should be treated specially by the register allocator. So its live range can conflict with the source live range.
Example 4:
float foo(int x, float y, unsigned c) { float res = 0.0; unsigned i; for (i = 0; i < c; i++) { float xx = (float)x[i]; xx = xx * y[i]; xx += res; res = xx; } return res; }
LBB_foo_3: # no_exit cvtsi2ss %XMM0, DWORD PTR [%EDX + 4%ESI] mulss %XMM0, DWORD PTR [%EAX + 4%ESI] addss %XMM0, %XMM1 inc %ESI cmp %ESI, %ECX **** movaps %XMM1, %XMM0 jb LBB_foo_3 # no_exit
We need to teach the coalescer to commute 2-addr instructions, allowing us to eliminate the reg-reg copy in this example:
There is also bug 770. We need live range splitting or be smarter about when to join a live range with another that is targetting a narrower register class.