hjl-tools commented 8 years ago


Bugzilla Link	27623
Resolution	WORKSFORME
Resolved on	Aug 04, 2016 15:14
Version	trunk
OS	Linux
CC	@zmodem

Extended Description

X86::MOV32mr to X86::PUSH64r conversion in r268227 leads incorrect code. I am trying to find a small testcase.

hjl-tools commented 8 years ago

Just for the record,

case X86::MOV32mr:
case X86::MOV64mr:
  unsigned int Reg = PushOp.getReg();

  // If storing a 32-bit vreg on 64-bit targets, extend to a 64-bit vreg
  // in preparation for the PUSH64. The upper 32 bits can be undef.
  if (Is64Bit && MOV->getOpcode() == X86::MOV32mr) {
    unsigned UndefReg = MRI->createVirtualRegister(&X86::GR64RegClass);
    Reg = MRI->createVirtualRegister(&X86::GR64RegClass);
    BuildMI(MBB, Context.Call, DL, TII->get(X86::IMPLICIT_DEF), UndefReg);
    BuildMI(MBB, Context.Call, DL, TII->get(X86::INSERT_SUBREG), Reg)
      .addReg(UndefReg)
      .addOperand(PushOp)
      .addImm(X86::sub_32bit);
  }

may not be safe. There are

if (!I->getOperand(X86::AddrBaseReg).isReg() ||
    (I->getOperand(X86::AddrBaseReg).getReg() != StackPtr) ||
    !I->getOperand(X86::AddrScaleAmt).isImm() ||
    (I->getOperand(X86::AddrScaleAmt).getImm() != 1) ||
    (I->getOperand(X86::AddrIndexReg).getReg() != X86::NoRegister) ||
    (I->getOperand(X86::AddrSegmentReg).getReg() != X86::NoRegister) ||
    !I->getOperand(X86::AddrDisp).isImm())
  return;

int64_t StackDisp = I->getOperand(X86::AddrDisp).getImm();
assert(StackDisp >= 0 &&
       "Negative stack displacement when passing parameters");

// We really don't want to consider the unaligned case.
if (StackDisp & (SlotSize - 1))
  return;

Is this possible that a 8 byte stack slot is used to spill 2 4-byte registers? One 4-byte register is spilled with SP and the other with FP.

llvmbot commented 8 years ago

I think so. HJ, if can provide a test case that fails, please reopen this or submit a new report.

zmodem commented 8 years ago

Should we close this bug?

hjl-tools commented 8 years ago

I don't have a small testcase. I will keep an eye on it.

llvmbot commented 8 years ago

I'll look into the extra 16-byte stack issue, but that doesn't look incorrect - just inefficient.

I don't see anything incorrect with the dumps you show in comment 11. The transformation looks valid, though it is hard to tell for sure.

To make further progress, I think you need to provide a failing test case.

llvmbot commented 8 years ago

You're not counting the sub $8, %esp that I see in the "bad" MIR. Hmmm, is that valid for x32, or does it need to be sub $8, %rsp?

Where does "sub $8, %rsp" come from?

From this MIR instruction:

    %ESP<def,tied1> = SUB32ri8 %ESP<tied0>, 8, %EFLAGS<imp-def,dead>

hjl-tools commented 8 years ago

For

extern void foo (void , void , void , void , void , void , int, int, void *);

void bar (void) { int i1, i2, i3, i4, i5, i6; int i; foo (&i1, &i2, &i3, &i4, &i5, &i6, 7, 8, &i); }

With -m64 -S -O2, without r268227:

bar: # @bar .cfi_startproc

BB#0:

subq    $56, %rsp

.Ltmp0: .cfi_def_cfa_offset 64 leaq 28(%rsp), %rax movq %rax, 16(%rsp) movl $8, 8(%rsp) movl $7, (%rsp) leaq 52(%rsp), %rdi leaq 48(%rsp), %rsi leaq 44(%rsp), %rdx leaq 40(%rsp), %rcx leaq 36(%rsp), %r8 leaq 32(%rsp), %r9 callq foo addq $56, %rsp retq

With r268227:

bar: # @bar .cfi_startproc

BB#0:

subq    $40, %rsp

.Ltmp0: .cfi_def_cfa_offset 48 subq $8, %rsp .Ltmp1: .cfi_adjust_cfa_offset 8 leaq 20(%rsp), %rax leaq 44(%rsp), %rdi leaq 40(%rsp), %rsi leaq 36(%rsp), %rdx leaq 32(%rsp), %rcx leaq 28(%rsp), %r8 leaq 24(%rsp), %r9 pushq %rax .Ltmp2: .cfi_adjust_cfa_offset 8 pushq $8 .Ltmp3: .cfi_adjust_cfa_offset 8 pushq $7 .Ltmp4: .cfi_adjust_cfa_offset 8 callq foo addq $72, %rsp <<<<<<<<<<<<< Why do we need extra 16 byte stack? .Ltmp5: .cfi_adjust_cfa_offset -32 retq

hjl-tools commented 8 years ago

You're not counting the sub $8, %esp that I see in the "bad" MIR. Hmmm, is that valid for x32, or does it need to be sub $8, %rsp?

Where does "sub $8, %rsp" come from?

hjl-tools commented 8 years ago

Good

BB#82: derived from LLVM BB %566 Predecessors according to CFG: BB#81 %vreg492 = MOV32rm <fi#9>, 1, %noreg, 0, %noreg; mem:LD4%12 GR32:%vreg492 %vreg493 = MOV32rm <fi#20>, 1, %noreg, 0, %noreg; mem:LD4%23 GR32:%vreg493 %vreg494 = MOV32rm <fi#21>, 1, %noreg, 0, %noreg; mem:LD4%24 GR32:%vreg494 %vreg495 = MOV32rm <fi#15>, 1, %noreg, 0, %noreg; mem:LD4%18 GR32:%vreg495 %vreg496 = SUBREG_TO_REG 0, %vreg495, 4; GR64:%vreg496 GR32:%vreg495 ADJCALLSTACKDOWN32 24, 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP %vreg497 = COPY %ESP; GR32:%vreg497 MOV32mr %vreg497, 1, %noreg, 16, %noreg, %vreg494; mem:ST4[Stack+16] GR32:%vreg497,%vreg494 MOV32mr %vreg497, 1, %noreg, 8, %noreg, %vreg493; mem:ST4[Stack+8] GR32:%vreg497,%vreg493 MOV32mr %vreg497, 1, %noreg, 0, %noreg, %vreg492; mem:ST4[Stack] GR32:%vreg497,%vreg492 %vreg498 = LEA64_32r <fi#11>, 1, %noreg, 0, %noreg; GR32:%vreg498 %vreg499 = SUBREG_TO_REG 0, %vreg498, 4; GR64:%vreg499 GR32:%vreg498 %vreg500 = LEA64_32r <fi#16>, 1, %noreg, 0, %noreg; GR32:%vreg500 %vreg501 = SUBREG_TO_REG 0, %vreg500, 4; GR64:%vreg501 GR32:%vreg500 %vreg502 = LEA64_32r <fi#17>, 1, %noreg, 0, %noreg; GR32:%vreg502 %vreg503 = SUBREG_TO_REG 0, %vreg502, 4; GR64:%vreg503 GR32:%vreg502 %vreg504 = LEA64_32r <fi#51>, 1, %noreg, 0, %noreg; GR32:%vreg504 %vreg505 = SUBREG_TO_REG 0, %vreg504, 4; GR64:%vreg505 GR32:%vreg504 %vreg506 = MOV32rr %vreg53; GR32:%vreg506,%vreg53 %vreg507 = SUBREG_TO_REG 0, %vreg506, 4; GR64:%vreg507 GR32:%vreg506 %RDI = COPY %vreg507; GR64:%vreg507 %RSI = COPY %vreg496; GR64:%vreg496 %RDX = COPY %vreg499; GR64:%vreg499 %RCX = COPY %vreg501; GR64:%vreg501 %R8 = COPY %vreg503; GR64:%vreg503 %R9 = COPY %vreg505; GR64:%vreg505 CALLpcrel32 ga:@_ZN4llvm14CGIOperandList11OperandInfoC2EPNS_6RecordERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESB_SB_SB_jjPNS_7DagInitE, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %ESP, %RDI, %RSI, %RDX, %RCX, %R8, %R9, %ESP ADJCALLSTACKUP32 24, 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP ADD32mi %vreg75, 1, %noreg, 12, %noreg, 144, %EFLAGS<imp-def,dead>; mem:ST4%573 LD4%573 GR32:%vreg75 JMP_1 <BB#84> Successors according to CFG: BB#84(?%)

Bad

BB#82: derived from LLVM BB %566 Predecessors according to CFG: BB#81 %vreg492 = MOV32rm <fi#9>, 1, %noreg, 0, %noreg; mem:LD4%12 GR32:%vreg492 %vreg493 = MOV32rm <fi#20>, 1, %noreg, 0, %noreg; mem:LD4%23 GR32:%vreg493 %vreg494 = MOV32rm <fi#21>, 1, %noreg, 0, %noreg; mem:LD4%24 GR32:%vreg494 %vreg495 = MOV32rm <fi#15>, 1, %noreg, 0, %noreg; mem:LD4%18 GR32:%vreg495 %vreg496 = SUBREG_TO_REG 0, %vreg495, 4; GR64:%vreg496 GR32:%vreg495 ADJCALLSTACKDOWN32 24, 24, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP %vreg498 = LEA64_32r <fi#11>, 1, %noreg, 0, %noreg; GR32:%vreg498 %vreg499 = SUBREG_TO_REG 0, %vreg498, 4; GR64:%vreg499 GR32:%vreg498 %vreg500 = LEA64_32r <fi#16>, 1, %noreg, 0, %noreg; GR32:%vreg500 %vreg501 = SUBREG_TO_REG 0, %vreg500, 4; GR64:%vreg501 GR32:%vreg500 %vreg502 = LEA64_32r <fi#17>, 1, %noreg, 0, %noreg; GR32:%vreg502 %vreg503 = SUBREG_TO_REG 0, %vreg502, 4; GR64:%vreg503 GR32:%vreg502 %vreg504 = LEA64_32r <fi#51>, 1, %noreg, 0, %noreg; GR32:%vreg504 %vreg505 = SUBREG_TO_REG 0, %vreg504, 4; GR64:%vreg505 GR32:%vreg504 %vreg506 = MOV32rr %vreg53; GR32:%vreg506,%vreg53 %vreg507 = SUBREG_TO_REG 0, %vreg506, 4; GR64:%vreg507 GR32:%vreg506 %RDI = COPY %vreg507; GR64:%vreg507 %RSI = COPY %vreg496; GR64:%vreg496 %RDX = COPY %vreg499; GR64:%vreg499 %RCX = COPY %vreg501; GR64:%vreg501 %R8 = COPY %vreg503; GR64:%vreg503 %R9 = COPY %vreg505; GR64:%vreg505 %vreg645 = IMPLICIT_DEF; GR64:%vreg645 %vreg646<def,tied1> = INSERT_SUBREG %vreg645, %vreg494, sub_32bit; GR64:%vreg646,%vreg645 GR32:%vreg494 PUSH64r %vreg646, %RSP, %RSP; GR64:%vreg646 CFI_INSTRUCTION %vreg647 = IMPLICIT_DEF; GR64:%vreg647 %vreg648<def,tied1> = INSERT_SUBREG %vreg647, %vreg493, sub_32bit; GR64:%vreg648,%vreg647 GR32:%vreg493 PUSH64r %vreg648, %RSP, %RSP; GR64:%vreg648 CFI_INSTRUCTION %vreg649 = IMPLICIT_DEF; GR64:%vreg649 %vreg650<def,tied1> = INSERT_SUBREG %vreg649, %vreg492, sub_32bit; GR64:%vreg650,%vreg649 GR32:%vreg492 PUSH64r %vreg650, %RSP, %RSP; GR64:%vreg650 CFI_INSTRUCTION CALLpcrel32 ga:@_ZN4llvm14CGIOperandList11OperandInfoC2EPNS_6RecordERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESB_SB_SB_jjPNS_7DagInitE, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %ESP, %RDI, %RSI, %RDX, %RCX, %R8, %R9, %ESP ADJCALLSTACKUP32 24, 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP ADD32mi %vreg75, 1, %noreg, 12, %noreg, 144, %EFLAGS<imp-def,dead>; mem:ST4%573 LD4%573 GR32:%vreg75 JMP_1 <BB#84> Successors according to CFG: BB#84(?%)

llvmbot commented 8 years ago

You're not counting the sub $8, %esp that I see in the "bad" MIR. Hmmm, is that valid for x32, or does it need to be sub $8, %rsp?

hjl-tools commented 8 years ago

Good:

0x00463646 <+2438>: lea 0x98(%rsp),%edx 0x0046364d <+2445>: lea 0xf0(%rsp),%ecx 0x00463654 <+2452>: lea 0x140(%rsp),%r8d 0x0046365c <+2460>: lea 0x108(%rsp),%r9d

Bad:

0x00463673 <+2499>: lea 0x78(%rsp),%edx 0x00463677 <+2503>: lea 0xd0(%rsp),%ecx 0x0046367e <+2510>: lea 0x120(%rsp),%r8d 0x00463686 <+2518>: lea 0xe8(%rsp),%r9d

0x78 + 3 * 8 == 0x90. This is off by 8 bytes.

llvmbot commented 8 years ago

Can you show me an earlier MIR dump - one from before the X86CallFrameOptimization, e.g. "IR Dump After Peephole Optimizations"?

It isn't obvious from your dumps that there is a problem with the store-to-push optimization.

hjl-tools commented 8 years ago

Variables may be OK. But local variables accessed from inlined funnction aren't.

hjl-tools commented 8 years ago

that the register being stored is literally the stack pointer?!? That seems odd for a number of reasons. Are you able to show me the MIR prior to the X86CallFrameOptimization? And are you targeting X32 perhaps?

It is x32. Good one:

BB#3: derived from LLVM BB %39 Live Ins: %RDI %RDX %RSI %R8 %R9 %R12 %R13 %R14 %R15 %R10D Predecessors according to CFG: BB#0 BB#2 %EBP = MOV32rm %ESP, 1, %noreg, 112, %noreg; mem:LD4FixedStack-3 %EBX = MOV32rm %ESP, 1, %noreg, 104, %noreg; mem:LD4FixedStack-2 %EAX = MOV32rm %ESP, 1, %noreg, 96, %noreg; mem:LD4FixedStack-1 %EDI<def,tied1> = SUB32rr %EDI, %R10D, %EFLAGS<imp-def,dead>, %RDI<imp-use,kill>, %RDI %EDI<def,tied1> = ADD32rr %EDI, %R14D, %EFLAGS<imp-def,dead>, %RDI<imp-use,kill>, %RDI %EAX = MOV32rm %EAX, 1, %noreg, 0, %noreg; mem:LD4%6 %EBX = MOV32rm %EBX, 1, %noreg, 0, %noreg; mem:LD4%7 %EBP = MOV32rm %EBP, 1, %noreg, 0, %noreg; mem:LD4%8 %ESI = MOV32rm %ESI, 1, %noreg, 0, %noreg, %RSI<imp-use,kill>, %RSI; mem:LD4%1 MOV32mr %ESP, 1, %noreg, 16, %noreg, %EBP; mem:ST4[Stack+16] MOV32mr %ESP, 1, %noreg, 8, %noreg, %EBX; mem:ST4[Stack+8] MOV32mr %ESP, 1, %noreg, 0, %noreg, %EAX; mem:ST4[Stack] %EDX = MOV32rr %EDX, %RDX<imp-use,kill>, %RDX %ECX = MOV32rr %R15D, %R15<imp-use,kill>, %RCX %R8D = MOV32rr %R8D, %R8<imp-use,kill>, %R8 %R9D = MOV32rr %R9D, %R9<imp-use,kill>, %R9 CALLpcrel32 ga:@_ZN4llvm14CGIOperandList11OperandInfoC2EPNS_6RecordERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESB_SB_SB_jjPNS_7DagInitE, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %ESP, %RDI, %RSI, %RDX, %RCX, %R8, %R9, %ESP

Bad one:

BB#3: derived from LLVM BB %39 Live Ins: %RDI %RDX %RSI %R8 %R9 %R12 %R13 %R14 %R15 %R10D Predecessors according to CFG: BB#0 BB#2 %EBP = MOV32rm %ESP, 1, %noreg, 80, %noreg; mem:LD4FixedStack-3 %EBX = MOV32rm %ESP, 1, %noreg, 72, %noreg; mem:LD4FixedStack-2 %EAX = MOV32rm %ESP, 1, %noreg, 64, %noreg; mem:LD4FixedStack-1 %EDI<def,tied1> = SUB32rr %EDI, %R10D, %EFLAGS<imp-def,dead>, %RDI<imp-use,kill>, %RDI %EDI<def,tied1> = ADD32rr %EDI, %R14D, %EFLAGS<imp-def,dead>, %RDI<imp-use,kill>, %RDI %EAX = MOV32rm %EAX, 1, %noreg, 0, %noreg, %RAX; mem:LD4%6 %EBX = MOV32rm %EBX, 1, %noreg, 0, %noreg, %RBX; mem:LD4%7 %EBP = MOV32rm %EBP, 1, %noreg, 0, %noreg, %RBP; mem:LD4%8 %ESI = MOV32rm %ESI, 1, %noreg, 0, %noreg, %RSI<imp-use,kill>, %RSI; mem:LD4%1 %ESP<def,tied1> = SUB32ri8 %ESP, 8, %EFLAGS<imp-def,dead> CFI_INSTRUCTION %EDX = MOV32rr %EDX, %RDX<imp-use,kill>, %RDX %ECX = MOV32rr %R15D, %R15<imp-use,kill>, %RCX %R8D = MOV32rr %R8D, %R8<imp-use,kill>, %R8 %R9D = MOV32rr %R9D, %R9<imp-use,kill>, %R9 PUSH64r %RBP, %RSP, %RSP CFI_INSTRUCTION PUSH64r %RBX, %RSP, %RSP CFI_INSTRUCTION PUSH64r %RAX, %RSP, %RSP CFI_INSTRUCTION CALLpcrel32 ga:@_ZN4llvm14CGIOperandList11OperandInfoC2EPNS_6RecordERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESB_SB_SB_jjPNS_7DagInitE, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %ESP, %RDI, %RSI, %RDX, %RCX, %R8, %R9, %ESP

llvmbot commented 8 years ago

We can't convert X86::MOV32mr to X86::PUSH64r if operand of X86::MOV32mr references stack.

What do you mean exactly by "X86::MOV32mr references stack"? Do you mean that the register being stored is literally the stack pointer?!? That seems odd for a number of reasons. Are you able to show me the MIR prior to the X86CallFrameOptimization? And are you targeting X32 perhaps?

hjl-tools commented 8 years ago

Can you be more specific about the symptoms, HJ? Did you mean "It miscompiles functions with variable argument lists"? At least in a very simple varargs case like this, the generated code looks fine.

I took it back. vararg is OK. What happens are

Function foo, calls a function bar which takes 9 parameters of integers and pointers. A local pointer variable was passed as the 9th argument of pointer.

Function bar calls function foobar, with the same 9 parameters.

Somehow along the way, the 9th argument points to the wrong location on stack in function foo when bar is inlined. foobar gets garbage as the 9th argument.

We can't convert X86::MOV32mr to X86::PUSH64r if operand of X86::MOV32mr references stack.

hjl-tools commented 8 years ago

Can you be more specific about the symptoms, HJ? Did you mean "It miscompiles functions with variable argument lists"? At least in a very simple varargs case like this, the generated code looks fine.

I took it back. vararg is OK. What happens are

Function foo, calls a function bar which takes 9 parameters of integers and pointers. A local pointer variable was passed as the 9th argument of pointer.
Function bar calls function foobar, with the same 9 parameters.
Somehow along the way, the 9th argument points to the wrong location on stack in function foo when bar is inlined. foobar gets garbage as the 9th argument.

llvmbot commented 8 years ago

Can you be more specific about the symptoms, HJ? Did you mean "It miscompiles functions with variable argument lists"? At least in a very simple varargs case like this, the generated code looks fine.

void f1(int, ...); void f2(int x, int y) { f1(1, 2, 3, 4, 5, 6, x+y, x-y); puts("hello"); }

f2: pushq %rax movl %edi, %r10d leal (%rsi,%r10), %r11d subl %esi, %r10d movl $1, %edi movl $2, %esi movl $3, %edx movl $4, %ecx movl $5, %r8d movl $6, %r9d movl $0, %eax pushq %r10 pushq %r11 callq f1 addq $16, %rsp movl $.L.str, %edi popq %rax jmp puts

hjl-tools commented 8 years ago

It compiles functions with variable argument lists.

llvm / llvm-project

r268227 X86::MOV32mr to X86::PUSH64r conversion leads incorrect code #27997

Extended Description

void bar (void) { int i1, i2, i3, i4, i5, i6; int i; foo (&i1, &i2, &i3, &i4, &i5, &i6, 7, 8, &i); }

BB#0:

BB#0: