systems-nuts / unifico

Compiler and build harness for heterogeneous-ISA binaries with the same stack layout.
3 stars 1 forks source link

`fp-loop`: GR32temp registers are inflated to GR32 #281

Closed blackgeorge-boom closed 1 year ago

blackgeorge-boom commented 1 year ago
#include <stdio.h>

double fmul(double *x, double *y)
{
    static double r = 0.5;
    r += 0.5;
    return (*x) * (*y) * r;
}

int main()
{
    double x = 3.5;
    double y = 4.5;
    double sum = 0;
    for (int i = 0; i < 16; i++) {
        sum = fmul(&x, &y);
        sum += fmul(&x, &y);
    }
    printf("%f", sum);
    return 0;
}

X86:


0000000000501060 <main>:
...
  501085:       mov    DWORD PTR [rbp-0x28],0x0
...
  501097:   |   call   501020 <fmul>
  50109c:   |   mov    eax,DWORD PTR [rbp-0x28]
  50109f:   |   mov    ebx,eax
...
  5010b9:   |   inc    ebx
  5010bb:   |   mov    eax,ebx
  5010bd:   |   mov    DWORD PTR [rbp-0x28],ebx
  5010c0:   |   cmp    ebx,0x10
  5010c3:   \-- jne    50108c <main+0x2c>
...
  5010e8:       ret    
    ...

The value reloaded from [rbp-0x28] is copied to a temp register since we are loading the zero constant (see https://github.com/blackgeorge-boom/llvm-project/pull/33). However, this value is copied later to a CSR (rbx), which should not happen (see https://github.com/blackgeorge-boom/llvm-project/pull/51).

blackgeorge-boom commented 1 year ago

Observing regalloc for x86, we see that the GR32temp class is inflated to GR32:

selectOrSplit GR32temp:%18 [80r,176B:0)[480B,816r:2)[816r,896B:1)  0@80r 1@816r 2@480B-phi weight:3.163462e-01 w=3.163462e-01
...
  rewr %bb.0    80r:1   %20:gr32temp = MOV32r0 implicit-def dead $eflags
  rewr %bb.2    816r:2  %21:gr32temp = nuw nsw INC32r %18:gr32temp(tied-def 0), implicit-def dead $eflags
  rewr %bb.2    608B:2  PCN_STACKMAP 1, 0, %21:gr32temp, 0, %stack.0.x, 0, 0, %stack.1.y, 0, 0, %stack.0.x, 0, 0, %stack.1.y, 0, implicit-def dead early-clobber $r11 :: (load 8 from %stack.0.x), (load 8 from %stack.1.y), (load 8 from %stack.0.x), (load 8 from %stack.1.y)
  rewr %bb.2    752B:2  PCN_STACKMAP 2, 0, %10:fr64, %21:gr32temp, 0, %stack.0.x, 0, 0, %stack.1.y, 0, 0, %stack.0.x, 0, 0, %stack.1.y, 0, implicit-def dead early-clobber $r11 :: (load 8 from %stack.0.x), (load 8 from %stack.1.y), (load 8 from %stack.0.x), (load 8 from %stack.1.y)
  rewr %bb.2    816B:2  %21:gr32temp = nuw nsw INC32r %21:gr32temp(tied-def 0), implicit-def dead $eflags
  rewr %bb.2    832B:2  CMP32ri8 %21:gr32temp, 16, implicit-def $eflags
  rewr %bb.2    600B:0  %21:gr32temp = COPY %19:gr32temp
  rewr %bb.2    824B:2  %19:gr32temp = COPY %21:gr32temp
All defs dead: dead %20:gr32temp = MOV32r0 implicit-def dead $eflags
Deleting dead def 80r   dead %20:gr32temp = MOV32r0 implicit-def dead $eflags
Inflated %20 to GR32
Inflated %21 to GR32