Open jeremy-rifkin opened 2 years ago
@llvm/issue-subscribers-backend-x86
Looks like X86 backend specific.
I've been poking around in X86FrameLowering and I've found out the following:
The MIR for foo
is as follows (from main@375f747993d0d5c21b783eb651aefef2935d1fe1
printed after prologepilog
):
bb.0.entry:
successors: %bb.1(0x00106035), %bb.2(0x7fef9fcb); %bb.1(0.05%), %bb.2(99.95%)
liveins: $edi, $esi
$rsp = frame-setup SUB64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags
frame-setup CFI_INSTRUCTION def_cfa_offset 48
MOV32mr $rsp, 1, $noreg, 32, $noreg, renamable $edi :: (store (s32) into %ir.p, align 8, !tbaa !5)
MOV32mr $rsp, 1, $noreg, 36, $noreg, renamable $esi :: (store (s32) into %ir.b2, !tbaa !10)
CMP32rr killed renamable $edi, killed renamable $esi, implicit-def $eflags
JCC_1 %bb.2, 13, implicit killed $eflags
JMP_1 %bb.1
bb.1.if.then:
; predecessors: %bb.0
successors: %bb.2(0x80000000); %bb.2(100.00%)
MOV32mi $rsp, 1, $noreg, 28, $noreg, 0 :: (store (s32) into %ir.ref.tmp, !tbaa !12)
MOV32mi $rsp, 1, $noreg, 24, $noreg, 0 :: (store (s32) into %ir.ref.tmp3, !tbaa !12)
MOV32mi $rsp, 1, $noreg, 20, $noreg, 0 :: (store (s32) into %ir.ref.tmp4, !tbaa !12)
MOV32mi $rsp, 1, $noreg, 16, $noreg, 0 :: (store (s32) into %ir.ref.tmp5, !tbaa !12)
MOV32mi $rsp, 1, $noreg, 12, $noreg, 0 :: (store (s32) into %ir.ref.tmp6, !tbaa !12)
MOV32mi $rsp, 1, $noreg, 8, $noreg, 0 :: (store (s32) into %ir.ref.tmp7, !tbaa !12)
MOV32mi $rsp, 1, $noreg, 4, $noreg, 0 :: (store (s32) into %ir.ref.tmp8, !tbaa !12)
renamable $rax = LEA64r $rsp, 1, $noreg, 4, $noreg
renamable $r10 = LEA64r $rsp, 1, $noreg, 8, $noreg
renamable $rdi = LEA64r $rsp, 1, $noreg, 32, $noreg
renamable $rsi = LEA64r $rsp, 1, $noreg, 28, $noreg
renamable $rdx = LEA64r $rsp, 1, $noreg, 24, $noreg
renamable $rcx = LEA64r $rsp, 1, $noreg, 20, $noreg
renamable $r8 = LEA64r $rsp, 1, $noreg, 16, $noreg
renamable $r9 = LEA64r $rsp, 1, $noreg, 12, $noreg
PUSH64r killed renamable $rax, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 8)
CFI_INSTRUCTION adjust_cfa_offset 8
PUSH64r killed renamable $r10, implicit-def $rsp, implicit $rsp :: (store (s64) into stack)
CFI_INSTRUCTION adjust_cfa_offset 8
CALL64pcrel32 target-flags(x86-plt) @_Z3barIJiiiiiiiEEvR4pairDpOT_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit $r9, implicit-def $rsp, implicit-def $ssp
$rsp = ADD64ri8 $rsp(tied-def 0), 16, implicit-def dead $eflags -----------------------------------------------------> (1)
CFI_INSTRUCTION adjust_cfa_offset -16
bb.2.if.end:
; predecessors: %bb.0, %bb.1
$rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags -----------------------------------------------------> (2)
frame-destroy CFI_INSTRUCTION def_cfa_offset 8
RET 0
Note that (1) is in a different basic block than (2).
In X86FrameLowering::emitEpilogue
there's a call to X86FrameLowering::mergeSPUpdates
which is supposed to combine consecutive updates to the stack pointer but X86FrameLowering::mergeSPUpdates
only looks at SP updates in bb.2
(because that's where the epilogue is supposed to be inserted) and since (1) is not in bb.2
it isn't merged with (2).
Moreover, bb.2
has two predecessors so I'm not sure whether it's possible to merge the add
s at all.
To my inexperienced self this appears to be as designed.
Thanks for sharing what you've found! I agree it's probably not possible to handle during the epilogue codegen.
To my also inexperienced self I would expect this to be resolved when the epilogue is eventually merged into its predecessors, which it looks like happens here:
# *** IR Dump After Branch Probability Basic Block Placement (block-placement) ***:
# Machine code for function _Z3fooii: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
...
bb.2 (%ir-block.23):
; predecessors: %bb.0
$rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags
CFI_INSTRUCTION def_cfa_offset 8
RETQ
bb.1 (%ir-block.15):
; predecessors: %bb.0
...
CALL64pcrel32 @_Z3barIJiiiiiiiEEvR4pairDpOT_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit $r9, implicit-def $rsp, implicit-def $ssp
$rsp = ADD64ri8 $rsp(tied-def 0), 16, implicit-def dead $eflags
CFI_INSTRUCTION adjust_cfa_offset -16
$rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags
CFI_INSTRUCTION def_cfa_offset 8
RETQ
I agree, this should be handled during basic block placement but that's a bit tricky since basic block placement is target independent. I'm looking into it further.
This code below generates assembly ending in redundant add instructions
https://godbolt.org/z/93desf8Wh