llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.12k stars 12.01k forks source link

[X86] Redundant adds in -O3 #53556

Open jeremy-rifkin opened 2 years ago

jeremy-rifkin commented 2 years ago

This code below generates assembly ending in redundant add instructions

struct pair {
    int a;
    int b;
};

template<typename... Args>
void bar(pair&, Args&&...);

void foo(int a, int b) {
    pair p{a, b};
    if(a < b) [[unlikely]] {
        bar(p, 0, 0, 0, 0, 0, 0, 0);
    }
}
...
call    ...
add     rsp, 16
add     rsp, 40
ret

https://godbolt.org/z/93desf8Wh

llvmbot commented 2 years ago

@llvm/issue-subscribers-backend-x86

fhahn commented 2 years ago

Looks like X86 backend specific.

omern1 commented 2 years ago

I've been poking around in X86FrameLowering and I've found out the following:

The MIR for foo is as follows (from main@375f747993d0d5c21b783eb651aefef2935d1fe1 printed after prologepilog):

bb.0.entry:
  successors: %bb.1(0x00106035), %bb.2(0x7fef9fcb); %bb.1(0.05%), %bb.2(99.95%)
  liveins: $edi, $esi
  $rsp = frame-setup SUB64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags
  frame-setup CFI_INSTRUCTION def_cfa_offset 48
  MOV32mr $rsp, 1, $noreg, 32, $noreg, renamable $edi :: (store (s32) into %ir.p, align 8, !tbaa !5)
  MOV32mr $rsp, 1, $noreg, 36, $noreg, renamable $esi :: (store (s32) into %ir.b2, !tbaa !10)
  CMP32rr killed renamable $edi, killed renamable $esi, implicit-def $eflags
  JCC_1 %bb.2, 13, implicit killed $eflags
  JMP_1 %bb.1

bb.1.if.then:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  MOV32mi $rsp, 1, $noreg, 28, $noreg, 0 :: (store (s32) into %ir.ref.tmp, !tbaa !12)
  MOV32mi $rsp, 1, $noreg, 24, $noreg, 0 :: (store (s32) into %ir.ref.tmp3, !tbaa !12)
  MOV32mi $rsp, 1, $noreg, 20, $noreg, 0 :: (store (s32) into %ir.ref.tmp4, !tbaa !12)
  MOV32mi $rsp, 1, $noreg, 16, $noreg, 0 :: (store (s32) into %ir.ref.tmp5, !tbaa !12)
  MOV32mi $rsp, 1, $noreg, 12, $noreg, 0 :: (store (s32) into %ir.ref.tmp6, !tbaa !12)
  MOV32mi $rsp, 1, $noreg, 8, $noreg, 0 :: (store (s32) into %ir.ref.tmp7, !tbaa !12)
  MOV32mi $rsp, 1, $noreg, 4, $noreg, 0 :: (store (s32) into %ir.ref.tmp8, !tbaa !12)
  renamable $rax = LEA64r $rsp, 1, $noreg, 4, $noreg
  renamable $r10 = LEA64r $rsp, 1, $noreg, 8, $noreg
  renamable $rdi = LEA64r $rsp, 1, $noreg, 32, $noreg
  renamable $rsi = LEA64r $rsp, 1, $noreg, 28, $noreg
  renamable $rdx = LEA64r $rsp, 1, $noreg, 24, $noreg
  renamable $rcx = LEA64r $rsp, 1, $noreg, 20, $noreg
  renamable $r8 = LEA64r $rsp, 1, $noreg, 16, $noreg
  renamable $r9 = LEA64r $rsp, 1, $noreg, 12, $noreg
  PUSH64r killed renamable $rax, implicit-def $rsp, implicit $rsp :: (store (s64) into stack + 8)
  CFI_INSTRUCTION adjust_cfa_offset 8
  PUSH64r killed renamable $r10, implicit-def $rsp, implicit $rsp :: (store (s64) into stack)
  CFI_INSTRUCTION adjust_cfa_offset 8
  CALL64pcrel32 target-flags(x86-plt) @_Z3barIJiiiiiiiEEvR4pairDpOT_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit $r9, implicit-def $rsp, implicit-def $ssp
  $rsp = ADD64ri8 $rsp(tied-def 0), 16, implicit-def dead $eflags -----------------------------------------------------> (1)
  CFI_INSTRUCTION adjust_cfa_offset -16

bb.2.if.end:
; predecessors: %bb.0, %bb.1

  $rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags -----------------------------------------------------> (2)
  frame-destroy CFI_INSTRUCTION def_cfa_offset 8
  RET 0

Note that (1) is in a different basic block than (2).

In X86FrameLowering::emitEpilogue there's a call to X86FrameLowering::mergeSPUpdates which is supposed to combine consecutive updates to the stack pointer but X86FrameLowering::mergeSPUpdates only looks at SP updates in bb.2 (because that's where the epilogue is supposed to be inserted) and since (1) is not in bb.2 it isn't merged with (2). Moreover, bb.2 has two predecessors so I'm not sure whether it's possible to merge the adds at all.

To my inexperienced self this appears to be as designed.

jeremy-rifkin commented 2 years ago

Thanks for sharing what you've found! I agree it's probably not possible to handle during the epilogue codegen.

To my also inexperienced self I would expect this to be resolved when the epilogue is eventually merged into its predecessors, which it looks like happens here:

# *** IR Dump After Branch Probability Basic Block Placement (block-placement) ***:
# Machine code for function _Z3fooii: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
...
bb.2 (%ir-block.23):
; predecessors: %bb.0
  $rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags
  CFI_INSTRUCTION def_cfa_offset 8
  RETQ
bb.1 (%ir-block.15):
; predecessors: %bb.0
  ...
  CALL64pcrel32 @_Z3barIJiiiiiiiEEvR4pairDpOT_, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit $r9, implicit-def $rsp, implicit-def $ssp
  $rsp = ADD64ri8 $rsp(tied-def 0), 16, implicit-def dead $eflags
  CFI_INSTRUCTION adjust_cfa_offset -16
  $rsp = frame-destroy ADD64ri8 $rsp(tied-def 0), 40, implicit-def dead $eflags
  CFI_INSTRUCTION def_cfa_offset 8
  RETQ
omern1 commented 2 years ago

I agree, this should be handled during basic block placement but that's a bit tricky since basic block placement is target independent. I'm looking into it further.