llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.96k stars 11.94k forks source link

Missed optimization when arguments are passed in callee-saved registers #13735

Open llvmbot opened 12 years ago

llvmbot commented 12 years ago
Bugzilla Link 13363
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor

Extended Description

The following missed optimization is from an out of tree target and happens when some of the input arguments of a function are being passed in callee saved registers.

Consider the following piece of code: (reg pairs and ints are 16bits wide) extern void foo(int); int bar(int a, int b, char c, float d, int e, char f) { foo(e); return e; }

Generates: push r14 push r15 push r28 push r29 movw r29:r28, r15:r14 ; make a copy of e movw r25:r24, r29:r28 ; move e to call foo call foo movw r25:r24, r29:r28 ; move copy of e to return pop r29 pop r28 pop r15 pop r14 ret

Argument "e" comes in r15:r14 (a callee saved reg) and it's moved to r25:r24 to pass it to foo. But notice it is also being copied to another CS reg (r29:r28) to be restored after the function call so it can be returned by bar. Since r15:r14 is a CS reg, it's guaranteed that the call won't modify its contents, making the copy unncessary, with the result of needing to save and restore a new CS reg.

Final code should look like this: push r14 push r15 movw r25:r24, r15:r14 ; move e to make the call call foo movw r25:r24, r15:r14 ; move e to return reg pop r15 pop r14 ret

If I don't pass "e" to foo then it gets optimized, but when the argument is passed to a function this happens. Of course things get worse when more arguments come to play, but this is simplified test case.

SSA code before regalloc is:

Function Live Ins: %R15R14 in %vreg5 Function Live Outs: %R25R24

BB#0: derived from LLVM BB %entry Live Ins: %R15R14 %vreg5 = COPY %R15R14; DREGS:%vreg5 ADJCALLSTACKDOWN 0, %SP, %SREG<imp-def,dead>, %SP %R25R24 = COPY %vreg5; DREGS:%vreg5 CALLk ga:@foo, %R25R24, , %SP, %SP ADJCALLSTACKUP 0, 0, %SP, %SREG<imp-def,dead>, %SP %R25R24 = COPY %vreg5; DREGS:%vreg5 RET %R25R24<imp-use,kill> notice how vreg5 is copied to R25:R24 before the ret. However after regalloc: %R29R28 = MOVWRdRr %R15R14 see that R15:R14 is marked as killed.

One more place of improvement is that although r15:r14 is a callee saved reg, in this function it's not being written at all so the push+pop pairs are not needed. If foo changed its value, it's responsible for saving and restoring its contents, but the caller bar is safe to make this optimization.

So at the end it should be: movw r25:r24, r15:r14 ; move e to make the call call foo movw r25:r24, r15:r14 ; move e to return reg ret

llvmbot commented 12 years ago

In order to reproduce this with an in tree target, the ARM backend in Thumb mode, there is a very easy way of doing it with the following changes:

1) In ARMCallingConv.td, add R3 to the CSR_AAPCS list (to make it callee saved) and remove it in the line "CCIfType<[i32], CCAssignToReg<[R0, R1, R2, R3]>>" inside the RetCC_ARM_APCS snippet to simulate the ABI conditions of my target. 2) In Thumb1FrameLowering.cpp change line 308 to bool isKill = MBB.isLiveIn(Reg) ? false : true; This way we only mark the reg as killed if it's not used to pass an input argument. Otherwise the machine verifier will give an error.

Compiling the following code with -03 and using -march=thumb:

extern void bar22(int, int, int, int); int foo22(int a, int b, int c, int d) { bar22(0,0,0,d); return d; }

generates

push {r3, r4, lr} add sp, #​4 mov r4, r3 movs r2, #​0 mov r0, r2 mov r1, r2 bl bar22 mov r0, r4 sub sp, #​4 pop {r3, r4, pc}

Notice how R3 is copied to R4.