Bad Codegen with PUSHF and POPF

bwendling commented 1 year ago

Note that replicating this issue is best done at SHA1 6eb0f8e28598658b4f5df27b35a5ea98bad68049.

There seems to be bad codegen in some instances when reading EFLAGS. The attached issue has this code:

94:                                               ; preds = %90, %87
  %95 = call i64 @llvm.x86.flags.read.u64() #5
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %12, ptr nonnull align 16 undef, i64 %89, i1 false) #5
  call void @llvm.x86.flags.write.u64(i64 %95) #5
  br label %96

The generated code is:

$ llc -O2 -o - bugpoint-reduced-simplified.bc
    .text
    .file   "alternative.c"
    .globl  apply_alternatives              # -- Begin function apply_alternatives
    .p2align    4, 0x90
    .type   apply_alternatives,@function
apply_alternatives:                     # @apply_alternatives
  ... ### snip ###
.LBB0_31:                               #   in Loop: Header=BB0_2 Depth=1
    movslq  %r13d, %rdx
    pushfq
    popq    %rbp
    movq    %r12, %rdi
    callq   memcpy@PLT
    pushq   %rbp
    movq    8(%rsp), %rbp                   # 8-byte Reload
    popfq
.LBB0_32:                               #   in Loop: Header=BB0_2 Depth=1

The issue is that between pushq and popfq the %rbp register is being reloaded. However, it's reloaded from the wrong slot. pushq changes the %rsp register, but the reload is still using the stack slot from before pushfq.

bugpoint-reduced-simplified.bc.txt

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-x86

llvmbot commented 1 year ago

@llvm/issue-subscribers-bug

bwendling commented 1 year ago

@qcolombet @topperc @nickdesaulniers

topperc commented 1 year ago

I'm having trouble producing the cited codegen on trunk. I'm seeing

.LBB0_30:                               #   in Loop: Header=BB0_2 Depth=1        
        movslq  %ebp, %rdx                                                       
        pushfq                                                                   
        popq    %r13                                                             
        movq    %r12, %rdi                                                       
        callq   memcpy@PLT                                                       
        pushq   %r13                                                             
        popfq

bwendling commented 1 year ago

It repos with https://github.com/llvm/llvm-project/commit/6eb0f8e28598658b4f5df27b35a5ea98bad68049. I just want to make sure that it's either fixed or needs a change.

What I think's happening is that the register allocator sees the live range of the stuff in %rbp ends after the pushq, so it inserts a reload right afterwards. This seems to be a bit overzealous even apart from this bug. Wouldn't it be better to reload the value closer to where it's needed?

topperc commented 1 year ago

I wonder if we should move expansion of RDFLAGS32/64 from EmitInstrWithCustomInserter to sometime after regalloc so that it can't be broken up.

bwendling commented 1 year ago

That would solve this issue, but I'm still concerned that a push in the middle of a function doesn't result in a stack adjustment for a reload. That's assuming my assumption is correct...

nickdesaulniers commented 1 year ago

cc @phoebewang

bwendling commented 1 year ago

The attached file shows the issue. Use this command:

$ llc ../prolog-epilog-sp-adjust.mir -mtriple=x86_64-linux-gnu -run-pass=prologepilog -o -

The MOV64rm in bb.43 gets the same offset whether the stack pointer was modified or not.

prolog-epilog-sp-adjust.mir.txt

bwendling commented 1 year ago

@topperc This looks like a real bug to me (see my last comment). I'm going to come up with a hopefully not horrible hack to see what you think.

phoebewang commented 1 year ago

Sorry, just took a look at this. I have a question about the MIR:

    PUSH64r killed renamable $rbp, implicit-def $rsp, implicit $rsp
    renamable $rbp = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
    POPF64 implicit-def $rsp, implicit-def dead $eflags, implicit-def dead $df, implicit $rsp
    renamable $rbp = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)

Is the first MOV64rm a factitious one? It doesn't make sense to me in the context. What's more, I doubt if it can happen in reality because compiler shouldn't schedule stack accessing instructions across stack register live range. In this case, no MOV64rm can be scheduled across PUSH64r and POPF64 given both defs $rsp. If it does happen in reality. I'd think it is a bug in the pass where inserting MOV64rm between PUSH64r and POPF64 rather than prologepilog. I think the only way prologepilog can help with is to force using frame pointer in such case. This might need to revert https://github.com/llvm/llvm-project/commit/f3481f43bbe2c8a24e74210d310cf3be291bf52d which was to fix another issue @nickdesaulniers talked before. If this is just one concern of D140045, maybe we can try to bundle PUSH64r and POPF64 together?

qcolombet commented 1 year ago

I wonder if we should move expansion of RDFLAGS32/64 from EmitInstrWithCustomInserter to sometime after regalloc so that it can't be broken up.

That would be the right thing to do.

What I think's happening is that the register allocator sees the live range of the stuff in %rbp ends after the pushq, so it inserts a reload right afterward

I had a quick look at 6eb0f8e and the problematic movq is indeed inserted for spilling by the register allocator. We are unlucky enough that the insertion happens right in the middle where the stack is being modified and we read at the wrong address. For the regalloc's defense, we're not supposed to have the stack changing under our feet.

The fact that is doesn't reproduce on ToT is just luck.

I'm still concerned that a push in the middle of a function doesn't result in a stack adjustment for a reload.

That's very true but IIRC that's how the frame indices assignment works in the PrologEpilogInserter. I.e., the frame indices are chosen once for each stack object and assumed they apply for the whole function (after the prologue and before the epilogue). If the stack changes within the function, then a frame pointer is supposed to be used instead. In other words, the base address is supposed to be stable.

Put differently, your concern is real, but that limitation has always been there and @topperc's fix is the way to go here (unless we do a big overhaul of the stack lowering).

qcolombet commented 1 year ago

@bwendling, I'm attaching a quick patch for what @topperc and I were discussing. tentative_x86_read_write_flags.patch

@topperc or someone else feel free to finish and commit this. I don't have time right now to add a proper test case and whatnot.

topperc commented 1 year ago

I’m confused. I thought we already had a patch. https://reviews.llvm.org/D140045

qcolombet commented 1 year ago

I’m confused. I thought we already had a patch. https://reviews.llvm.org/D140045

Ah you’re right! I missed it. Looks like the patch didn’t land and @bwendling didn’t list the link in the previous comments.

nickdesaulniers commented 1 year ago

oh, I forgot about that patch, too.

Once @phoebewang is back from holiday (please take time off to enjoy your holiday @phoebewang ), I think Phoebe can rereview https://reviews.llvm.org/D140045 then we can land.

bwendling commented 1 year ago

@phoebewang The second MOV64rm is superfluous in the example. I added it to show that the indices will be "correct" outside of the PUSHF/POPF sequences.

@topperc @qcolombet @nickdesaulniers I also forgot about that patch. :-) One difference between it and the patch @qcolombet has is where the expansion code is placed. I can move the code in the patch over if that's a better place.

bwendling commented 1 year ago

To https://github.com/llvm/llvm-project.git 053479118f18..7d626e7cbb3a HEAD -> main

llvm / llvm-project

Bad Codegen with PUSHF and POPF #59102