Open Febbe opened 8 months ago
@llvm/issue-subscribers-backend-risc-v
Author: Fabian Keßler (Febbe)
I suspect the use of uintptr_t, bitcasts, and the UB prevented us from reasoning about the pointer arithmetic in a straightforward way. The 12 for the add a2,sp,12
was only known after register allocation and stack framelayout. There are no optimization passes that can remove redundant computation after that point.
I suspect the use of uintptr_t, bitcasts, and the UB prevented us from reasoning about the pointer arithmetic in a straightforward way. The 12 for the
add a2,sp,12
was only known after register allocation and stack framelayout. There are no optimization passes that can remove redundant computation after that point.
Isn't that a thing that should be done at LLVM IR level?
I suspect the use of uintptr_t, bitcasts, and the UB prevented us from reasoning about the pointer arithmetic in a straightforward way. The 12 for the
add a2,sp,12
was only known after register allocation and stack framelayout. There are no optimization passes that can remove redundant computation after that point.Isn't that a thing that should be done at LLVM IR level?
Stack frame layout happens after (most of) instruction selection, LLVM IR is long gone.
(I say "most of" because it's a bit fuzzy; it happens after what LLVM calls ISel, but given pseudos and things we haven't quite decided on the exact instructions yet, so depends what definition you use)
This is what we had immediatley after isel. Note this was riscv64 because that's the toolchain I had available.
*** MachineFunction at end of ISel ***
# Machine code for function hack_me: IsSSA, TracksLiveness
Frame Objects:
fi#0: size=8, align=8, at location [SP]
bb.0.entry:
successors: %bb.2(0x30000000), %bb.1(0x50000000); %bb.2(37.50%), %bb.1(62.50%)
%1:gpr = PseudoLLA @_ZL4base
%0:gpr = LD %1:gpr, 0 :: (dereferenceable load (s64) from @_ZL4base, !tbaa !9)
%2:gpr = ADDI %0:gpr, 256
SD killed %2:gpr, %1:gpr, 0 :: (store (s64) into @_ZL4base, !tbaa !9)
%3:gpr = LBU %0:gpr, 0 :: (load (s8) from %ir.1, !tbaa !13, !range !14)
BEQ killed %3:gpr, $x0, %bb.2
PseudoBR %bb.1
bb.1.if.end:
; predecessors: %bb.0
successors: %bb.2(0x80000000); %bb.2(100.00%)
LIFETIME_START %stack.0.local.addr.i
LIFETIME_END %stack.0.local.addr.i
%4:gpr = LD %0:gpr, 8 :: (load (s64) from %ir.4, !tbaa !13)
%5:gpr = ADDI %stack.0.local.addr.i, 0
%6:gpr = ADD killed %5:gpr, killed %4:gpr
%7:gpr = LD %0:gpr, 16 :: (load (s64) from %ir.6, !tbaa !13)
SD killed %7:gpr, killed %6:gpr, -12 :: (store (s64) into %ir.8, !tbaa !16)
bb.2.cleanup:
; predecessors: %bb.0, %bb.1
PseudoRET
# End machine code for function hack_me
The %5:gpr = ADDI %stack.0.local.addr.i, 0
means replace this with the computation of the address of %stack.0.local.addr.i when we eventually know it. And we can't know it until after register allocation after any spill slots are created.
Very interesting, is there even any benefit of adding an optimization pass after the stack frame layout / register allocation?
Btw. I found a solution I dismissed 6h ago for a now unknown reason, and it does not mess around with register calculation:
static void* asm_builtin_stack_address() {
void* res;
asm("mv %0,sp":"=r" (res):);
return res;
}
I think it was not inlined for some reason and produced the sp of asm_builtin_stack_address (sp-16
).
Having the code:
produces
At address
0x170
12
is added tosp
and stored in a2 It is subtracted in0x178
again. A whole instruction could be saved here:170: 00c10613 add a2,sp,a1
I assume this can happen more often in many other cases.