systems-nuts / unifico

Compiler and build harness for heterogeneous-ISA binaries with the same stack layout.
4 stars 1 forks source link

mg::interp Different register allocation #194

Open blackgeorge-boom opened 2 years ago

blackgeorge-boom commented 2 years ago
void simple(int n)
{
    return;
}

static void interp(void *oz, int mm1, int mm2, int mm3,
                   void *ou, int n1, int n2, int n3);

int main()
{
    double u[10];

    interp(&u[0], 1, 1, 1, &u[0], 1, 1, 1);
}

static void interp(void *oz, int mm1, int mm2, int mm3,
                   void *ou, int n1, int n2, int n3)
{
  int (*z)[mm2][mm1] = (int (*)[mm2][mm1])oz;
  int (*u)[n2][n1] = (int (*)[n2][n1])ou;

  int i3 = 3, i2 = 2, i1 = 1;

  simple(1);
  for (i1 = 0; i1 < mm1-1; i1++) {
    u[i3][i2][i1] = z[i3][i2][i1];
  }
}
blackgeorge-boom commented 2 years ago

The architectures use the CSRs differently:

AArch64:

  5010e0: f3 03 01 2a                   mov w19, w1                 ; mm1
;   int (*u)[n2][n1] = (int (*)[n2][n1])ou;
  5010e4: f4 03 05 2a                   mov w20, w5                 ; n1

X86:

  5010c0:   8b 5d 10                mov    ebx,DWORD PTR [rbp+0x10] ; n2
  5010c3:   45 89 cf                mov    r15d,r9d                 ; n1
  5010c6:   4c 89 45 b0             mov    QWORD PTR [rbp-0x50],r8
...
  5010eb:   e8 30 ff ff ff          call   501020 <simple>
  5010f0:   49 89 d8                mov    r8,rbx                   ; n2
  5010f3:   48 8b 5d 90             mov    rbx,QWORD PTR [rbp-0x70] ; mm1
blackgeorge-boom commented 2 years ago

I think the problem is that X86 uses the same register as destination and operand (two-address operations):

288B      undef %31.sub_32bit:gr64_with_sub_8bit = MOV32rm %fixed-stack.1, 1, $noreg, 0, $noreg :: (dereferenceable load 4 from %ir.n2.addr, align 16)
320B      undef %3.sub_32bit:gr64_with_sub_8bit = MOV32rr %9:gr32
...
608B      %22:gr64_with_sub_8bit = nuw IMUL64rr %22:gr64_with_sub_8bit(tied-def 0), %1:gr64_with_sub_8bit, implicit-def dead $eflags
640B      %31:gr64_with_sub_8bit = nuw IMUL64rr %31:gr64_with_sub_8bit(tied-def 0), %3:gr64_with_sub_8bit, implicit-def dead $eflags
...
1008B     %33:gr64 = nsw IMUL64rr %33:gr64(tied-def 0), %31:gr64_with_sub_8bit, implicit-def dead $eflags

So, it decides to use first a CSR for n2, so that the value will survive the whole live range.