Closed hjl-tools closed 8 years ago
Just for the record,
case X86::MOV32mr:
case X86::MOV64mr:
unsigned int Reg = PushOp.getReg();
// If storing a 32-bit vreg on 64-bit targets, extend to a 64-bit vreg
// in preparation for the PUSH64. The upper 32 bits can be undef.
if (Is64Bit && MOV->getOpcode() == X86::MOV32mr) {
unsigned UndefReg = MRI->createVirtualRegister(&X86::GR64RegClass);
Reg = MRI->createVirtualRegister(&X86::GR64RegClass);
BuildMI(MBB, Context.Call, DL, TII->get(X86::IMPLICIT_DEF), UndefReg);
BuildMI(MBB, Context.Call, DL, TII->get(X86::INSERT_SUBREG), Reg)
.addReg(UndefReg)
.addOperand(PushOp)
.addImm(X86::sub_32bit);
}
may not be safe. There are
if (!I->getOperand(X86::AddrBaseReg).isReg() ||
(I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) ||
!I->getOperand(X86::AddrScaleAmt).isImm() ||
(I->getOperand(X86::AddrScaleAmt).getImm() != 1) ||
(I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) ||
(I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) ||
!I->getOperand(X86::AddrDisp).isImm())
return;
int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();
assert(StackDisp >= 0 &&
"Negative stack displacement when passing parameters");
// We really don't want to consider the unaligned case.
if (StackDisp & (SlotSize - 1))
return;
Is this possible that a 8 byte stack slot is used to spill 2 4-byte registers? One 4-byte register is spilled with SP and the other with FP.
I think so. HJ, if can provide a test case that fails, please reopen this or submit a new report.
Should we close this bug?
I don't have a small testcase. I will keep an eye on it.
I'll look into the extra 16-byte stack issue, but that doesn't look incorrect - just inefficient.
I don't see anything incorrect with the dumps you show in comment 11. The transformation looks valid, though it is hard to tell for sure.
To make further progress, I think you need to provide a failing test case.
You're not counting the sub $8, %esp that I see in the "bad" MIR. Hmmm, is that valid for x32, or does it need to be sub $8, %rsp?
Where does "sub $8, %rsp" come from?
From this MIR instruction:
%ESP<def,tied1> = SUB32ri8 %ESP<tied0>, 8, %EFLAGS<imp-def,dead>
For
extern void foo (void , void , void , void , void , void , int, int, void *);
With -m64 -S -O2, without r268227:
bar: # @bar .cfi_startproc
subq $56, %rsp
.Ltmp0: .cfi_def_cfa_offset 64 leaq 28(%rsp), %rax movq %rax, 16(%rsp) movl $8, 8(%rsp) movl $7, (%rsp) leaq 52(%rsp), %rdi leaq 48(%rsp), %rsi leaq 44(%rsp), %rdx leaq 40(%rsp), %rcx leaq 36(%rsp), %r8 leaq 32(%rsp), %r9 callq foo addq $56, %rsp retq
With r268227:
bar: # @bar .cfi_startproc
subq $40, %rsp
.Ltmp0: .cfi_def_cfa_offset 48 subq $8, %rsp .Ltmp1: .cfi_adjust_cfa_offset 8 leaq 20(%rsp), %rax leaq 44(%rsp), %rdi leaq 40(%rsp), %rsi leaq 36(%rsp), %rdx leaq 32(%rsp), %rcx leaq 28(%rsp), %r8 leaq 24(%rsp), %r9 pushq %rax .Ltmp2: .cfi_adjust_cfa_offset 8 pushq $8 .Ltmp3: .cfi_adjust_cfa_offset 8 pushq $7 .Ltmp4: .cfi_adjust_cfa_offset 8 callq foo addq $72, %rsp <<<<<<<<<<<<< Why do we need extra 16 byte stack? .Ltmp5: .cfi_adjust_cfa_offset -32 retq
You're not counting the sub $8, %esp that I see in the "bad" MIR. Hmmm, is that valid for x32, or does it need to be sub $8, %rsp?
Where does "sub $8, %rsp" come from?
Good
BB#82: derived from LLVM BB %566
Predecessors according to CFG: BB#81
%vreg492
Bad
BB#82: derived from LLVM BB %566
Predecessors according to CFG: BB#81
%vreg492
You're not counting the sub $8, %esp that I see in the "bad" MIR. Hmmm, is that valid for x32, or does it need to be sub $8, %rsp?
Good:
0x00463646 <+2438>: lea 0x98(%rsp),%edx 0x0046364d <+2445>: lea 0xf0(%rsp),%ecx 0x00463654 <+2452>: lea 0x140(%rsp),%r8d 0x0046365c <+2460>: lea 0x108(%rsp),%r9d
Bad:
0x00463673 <+2499>: lea 0x78(%rsp),%edx 0x00463677 <+2503>: lea 0xd0(%rsp),%ecx 0x0046367e <+2510>: lea 0x120(%rsp),%r8d 0x00463686 <+2518>: lea 0xe8(%rsp),%r9d
0x78 + 3 * 8 == 0x90. This is off by 8 bytes.
Can you show me an earlier MIR dump - one from before the X86CallFrameOptimization, e.g. "IR Dump After Peephole Optimizations"?
It isn't obvious from your dumps that there is a problem with the store-to-push optimization.
Variables may be OK. But local variables accessed from inlined funnction aren't.
that the register being stored is literally the stack pointer?!? That seems odd for a number of reasons. Are you able to show me the MIR prior to the X86CallFrameOptimization? And are you targeting X32 perhaps?
It is x32. Good one:
BB#3: derived from LLVM BB %39
Live Ins: %RDI %RDX %RSI %R8 %R9 %R12 %R13 %R14 %R15 %R10D
Predecessors according to CFG: BB#0 BB#2
%EBP
Bad one:
BB#3: derived from LLVM BB %39
Live Ins: %RDI %RDX %RSI %R8 %R9 %R12 %R13 %R14 %R15 %R10D
Predecessors according to CFG: BB#0 BB#2
%EBP
We can't convert X86::MOV32mr to X86::PUSH64r if operand of X86::MOV32mr references stack.
What do you mean exactly by "X86::MOV32mr references stack"? Do you mean that the register being stored is literally the stack pointer?!? That seems odd for a number of reasons. Are you able to show me the MIR prior to the X86CallFrameOptimization? And are you targeting X32 perhaps?
Can you be more specific about the symptoms, HJ? Did you mean "It miscompiles functions with variable argument lists"? At least in a very simple varargs case like this, the generated code looks fine.
I took it back. vararg is OK. What happens are
- Function foo, calls a function bar which takes 9 parameters of integers and pointers. A local pointer variable was passed as the 9th argument of pointer.
- Function bar calls function foobar, with the same 9 parameters.
- Somehow along the way, the 9th argument points to the wrong location on stack in function foo when bar is inlined. foobar gets garbage as the 9th argument.
We can't convert X86::MOV32mr to X86::PUSH64r if operand of X86::MOV32mr references stack.
Can you be more specific about the symptoms, HJ? Did you mean "It miscompiles functions with variable argument lists"? At least in a very simple varargs case like this, the generated code looks fine.
I took it back. vararg is OK. What happens are
Can you be more specific about the symptoms, HJ? Did you mean "It miscompiles functions with variable argument lists"? At least in a very simple varargs case like this, the generated code looks fine.
void f1(int, ...); void f2(int x, int y) { f1(1, 2, 3, 4, 5, 6, x+y, x-y); puts("hello"); }
f2: pushq %rax movl %edi, %r10d leal (%rsi,%r10), %r11d subl %esi, %r10d movl $1, %edi movl $2, %esi movl $3, %edx movl $4, %ecx movl $5, %r8d movl $6, %r9d movl $0, %eax pushq %r10 pushq %r11 callq f1 addq $16, %rsp movl $.L.str, %edi popq %rax jmp puts
It compiles functions with variable argument lists.
Extended Description
X86::MOV32mr to X86::PUSH64r conversion in r268227 leads incorrect code. I am trying to find a small testcase.