Closed bwendling closed 1 year ago
@llvm/issue-subscribers-backend-x86
@llvm/issue-subscribers-bug
@qcolombet @topperc @nickdesaulniers
I'm having trouble producing the cited codegen on trunk. I'm seeing
.LBB0_30: # in Loop: Header=BB0_2 Depth=1
movslq %ebp, %rdx
pushfq
popq %r13
movq %r12, %rdi
callq memcpy@PLT
pushq %r13
popfq
It repos with https://github.com/llvm/llvm-project/commit/6eb0f8e28598658b4f5df27b35a5ea98bad68049. I just want to make sure that it's either fixed or needs a change.
What I think's happening is that the register allocator sees the live range of the stuff in %rbp
ends after the pushq
, so it inserts a reload right afterwards. This seems to be a bit overzealous even apart from this bug. Wouldn't it be better to reload the value closer to where it's needed?
I wonder if we should move expansion of RDFLAGS32/64 from EmitInstrWithCustomInserter to sometime after regalloc so that it can't be broken up.
That would solve this issue, but I'm still concerned that a push
in the middle of a function doesn't result in a stack adjustment for a reload. That's assuming my assumption is correct...
cc @phoebewang
The attached file shows the issue. Use this command:
$ llc ../prolog-epilog-sp-adjust.mir -mtriple=x86_64-linux-gnu -run-pass=prologepilog -o -
The MOV64rm
in bb.43
gets the same offset whether the stack pointer was modified or not.
@topperc This looks like a real bug to me (see my last comment). I'm going to come up with a hopefully not horrible hack to see what you think.
Sorry, just took a look at this. I have a question about the MIR:
PUSH64r killed renamable $rbp, implicit-def $rsp, implicit $rsp
renamable $rbp = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
POPF64 implicit-def $rsp, implicit-def dead $eflags, implicit-def dead $df, implicit $rsp
renamable $rbp = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
Is the first MOV64rm
a factitious one? It doesn't make sense to me in the context.
What's more, I doubt if it can happen in reality because compiler shouldn't schedule stack accessing instructions across stack register live range. In this case, no MOV64rm
can be scheduled across PUSH64r
and POPF64
given both defs $rsp
.
If it does happen in reality. I'd think it is a bug in the pass where inserting MOV64rm
between PUSH64r
and POPF64
rather than prologepilog
.
I think the only way prologepilog
can help with is to force using frame pointer in such case. This might need to revert https://github.com/llvm/llvm-project/commit/f3481f43bbe2c8a24e74210d310cf3be291bf52d which was to fix another issue @nickdesaulniers talked before.
If this is just one concern of D140045, maybe we can try to bundle PUSH64r
and POPF64
together?
I wonder if we should move expansion of RDFLAGS32/64 from EmitInstrWithCustomInserter to sometime after regalloc so that it can't be broken up.
That would be the right thing to do.
What I think's happening is that the register allocator sees the live range of the stuff in %rbp ends after the pushq, so it inserts a reload right afterward
I had a quick look at 6eb0f8e and the problematic movq
is indeed inserted for spilling by the register allocator.
We are unlucky enough that the insertion happens right in the middle where the stack is being modified and we read at the wrong address.
For the regalloc's defense, we're not supposed to have the stack changing under our feet.
The fact that is doesn't reproduce on ToT is just luck.
I'm still concerned that a push in the middle of a function doesn't result in a stack adjustment for a reload.
That's very true but IIRC that's how the frame indices assignment works in the PrologEpilogInserter. I.e., the frame indices are chosen once for each stack object and assumed they apply for the whole function (after the prologue and before the epilogue). If the stack changes within the function, then a frame pointer is supposed to be used instead. In other words, the base address is supposed to be stable.
Put differently, your concern is real, but that limitation has always been there and @topperc's fix is the way to go here (unless we do a big overhaul of the stack lowering).
@bwendling, I'm attaching a quick patch for what @topperc and I were discussing. tentative_x86_read_write_flags.patch
@topperc or someone else feel free to finish and commit this. I don't have time right now to add a proper test case and whatnot.
I’m confused. I thought we already had a patch. https://reviews.llvm.org/D140045
I’m confused. I thought we already had a patch. https://reviews.llvm.org/D140045
Ah you’re right! I missed it. Looks like the patch didn’t land and @bwendling didn’t list the link in the previous comments.
oh, I forgot about that patch, too.
Once @phoebewang is back from holiday (please take time off to enjoy your holiday @phoebewang ), I think Phoebe can rereview https://reviews.llvm.org/D140045 then we can land.
@phoebewang The second MOV64rm
is superfluous in the example. I added it to show that the indices will be "correct" outside of the PUSHF
/POPF
sequences.
@topperc @qcolombet @nickdesaulniers I also forgot about that patch. :-) One difference between it and the patch @qcolombet has is where the expansion code is placed. I can move the code in the patch over if that's a better place.
To https://github.com/llvm/llvm-project.git 053479118f18..7d626e7cbb3a HEAD -> main
Note that replicating this issue is best done at SHA1 6eb0f8e28598658b4f5df27b35a5ea98bad68049.
There seems to be bad codegen in some instances when reading
EFLAGS
. The attached issue has this code:The generated code is:
The issue is that between
pushq
andpopfq
the%rbp
register is being reloaded. However, it's reloaded from the wrong slot.pushq
changes the%rsp
register, but the reload is still using the stack slot from beforepushfq
.bugpoint-reduced-simplified.bc.txt