llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.03k stars 11.97k forks source link

Greedy allocator fails to fold a memory operand when spilling #10438

Open llvmbot opened 13 years ago

llvmbot commented 13 years ago
Bugzilla Link 10066
Version trunk
OS All
Attachments LLVM IR source, Resulting code
Reporter LLVM Bugzilla Contributor
CC @asl,@gandhi56,@stoklund

Extended Description

Hi.

The code produced for the test in attachment is worse with the greedy allocator than it was with the linear scan allocator (note that the test is compiled with -disable-fp-elim option).

The test is a pair of nested loops. The outer loop counter (in vreg17) is spilled on x86 but despite being used only once (for decrementing) stack slot is not folded into the instruction. This happens because vreg17 is first split into two registers (vreg54 and vreg53). Physical register is allocated for the first one and the second one is spilled, so at that point it is not possible to fold the memory operand unless it is somehow known that the two registers originated from a common one.

Seems that greedy allocator is too greedy at splitting the intervals :) This might not be a problem for a RISC architecture but it is not very good for CISC.

gandhi56 commented 3 years ago

The type of %C in the IR appears to be inconsistent in lines 5 and 36.

1ba3d143-a64b-4671-82b2-0b31cfb91709 commented 13 years ago

Note that greedy has one less spill of that variable:

BB#1: # %bb5.preheader.lr.ph.split.us

    movswl  %ax, %eax     # First use

.. .LBB0_2: # %bb1.lr.ph.us

=>This Loop Header: Depth=1

                                    #     Child Loop BB0_3 Depth 2
    movl    %eax, -28(%ebp)         # 4-byte Spill

... .LBB0_6: # %bb6.us

in Loop: Header=BB0_2 Depth=1

    movl    -28(%ebp), %eax         # 4-byte Reload
    decl    %eax           # Second use
    jne     .LBB0_2

... %eax dead after the loop

Greedy writes it N times while linear scan would write it N+1 times. In this case, it would probably be better to spill everywhere, though.

I'll put it on my queue.