systems-nuts / unifico

Compiler and build harness for heterogeneous-ISA binaries with the same stack layout.
4 stars 1 forks source link

`INC32r`/`DEC32r` are converted to `LEA64_32r` instead of `LEA32r` #259

Closed blackgeorge-boom closed 1 year ago

blackgeorge-boom commented 1 year ago
void swap(int v[], int k)
{
    int temp;
    temp = v[k];
    v[k] = v[k + 1];
    v[k + 1] = temp;
}

void sort(int v[], int n)
{
    int i, j;
    for (i = 0; i < n; i += 1) {
        j = i - 1;   // <---
        swap(v, j);
    }
}

int main()
{
    int v[5] = {4, 1, 3, 2, 1};
    sort(v, 5);

    return 0;
}

X86 leverages the LEA64_32r instruction inside sort:


0000000000501050 <sort>:
...
  50105b:   85 f6                   test   esi,esi
  50105d:   0f 8e 36 00 00 00       jle    501099 <sort+0x49>
  501063:   48 89 fb                mov    rbx,rdi
  501066:   31 c0                   xor    eax,eax <---
  501068:   89 75 dc                mov    DWORD PTR [rbp-0x24],esi
  50106b:   48 89 45 e0             mov    QWORD PTR [rbp-0x20],rax <---
  50106f:   44 8d 78 ff             lea    r15d,[rax-0x1] <---
...

but then has to spill rax as an 8-byte value. Whereas AArch64:


0000000000501050 sort:
...
  501060: 3f 04 00 71                   cmp w1, #0x1
  501064: ab 01 00 54                   b.lt    #0x34 <sort+0x48>
  501068: f3 03 00 aa                   mov x19, x0
  50106c: e8 03 1f 2a                   mov w8, wzr <---
  501070: e1 03 00 b9                   str w1, [sp]
  501074: 14 05 00 51                   sub w20, w8, #0x1 <---
  501078: e0 03 13 aa                   mov x0, x19
  50107c: e1 03 14 2a                   mov w1, w20
  501080: e8 07 00 b9                   str w8, [sp, #0x4] <---
...

spills a 4-byte value.

blackgeorge-boom commented 1 year ago

Looking at how the INC32r/DEC32r is converted into LEA instructions in X86InstrInfo::convertToThreeAddress:

case X86::INC32r: {
...
    unsigned Opc = MIOpc == X86::INC64r ? X86::LEA64r :
        (Is64Bit ? X86::LEA64_32r : X86::LEA32r);

We go directly to LEA64_32r in the case of 64-bit subtargets, instead of LEA32r. The former uses 64-bit registers as arguments, even though our INC32r instruction requires just 32 bits. The register usage is the same essentially, but for example, if we spill these registers we end up in 8-byte spills in the first case, instead of 4-byte spills.

The reason that LEA32r is avoided is that when we want to use 32-bit operands, the extra 0x67 prefix is needed to encode the instruction: https://stackoverflow.com/questions/59153772/address-size-override-prefix-in-64-bit-or-using-64-bit-registers.

For now, we can enforce the use of LEA32r instead.