systems-nuts / unifico

Compiler and build harness for heterogeneous-ISA binaries with the same stack layout.
3 stars 1 forks source link

X86 uses vectors for a floating-point comparison in condition #284

Closed blackgeorge-boom closed 1 year ago

blackgeorge-boom commented 1 year ago
#include <stdio.h>

#include "npbparams.h"

double timer_read(int n) { return 0.0; }

int main(int argc, char **argv)
{
    int timer_on;
    double timecounter = 0.0;

    if (timer_on) {
        double t_percent;

        timecounter = timer_read(1);
        t_percent = timecounter * 100.;
        printf(" Initialization : %8.3f (%5.2f%%)\n", timecounter, t_percent);
        timecounter = timer_read(2);
        t_percent = timecounter * 100.;
        printf(" Sorting        : %8.3f (%5.2f%%)\n", timecounter, t_percent);
    }

    return 0;
}
make clean; make stackmaps-check -j10 OBJDUMP_FLAGS= OPT_LEVEL=-O1

WARNING: main: callsite 1, value locations 0/0 have different location offset or  different constant (-16 vs. -32)
WARNING: main: callsite 2, value locations 0/0 have different location offset or  different constant (-16 vs. -32)
WARNING: main: callsite 3, value locations 0/0 have different location offset or  different constant (-24 vs. -40)
WARNING: main: callsite 3, value locations 1/1 have different location offset or  different constant (-16 vs. -32)
ERROR: stackmaps in 'main_aarch64_aligned.out' & 'main_x86_64_aligned.out' differ - different stack layout!
make: *** [../../common/common.mk:241: stackmaps-check] Error 1
blackgeorge-boom commented 1 year ago

X86 isel

 %8:fr64 = CMPSDrr %3:fr64(tied-def 0), killed %7:fr64, 0, debug-location !28; main.c:18:6
  %9:vr128 = COPY %8:fr64, debug-location !28; main.c:18:6
  %10:vr128 = MOVSDrm $rip, 1, $noreg, %const.0, $noreg, debug-location !28 :: (load 8 from constant-pool); main.c:18:6
  %11:vr128 = PANDrr %9:vr128(tied-def 0), killed %10:vr128, debug-location !28; main.c:18:6
  %12:vr128 = COPY %3:fr64, debug-location !28; main.c:18:6
  %13:vr128 = PANDNrr %9:vr128(tied-def 0), killed %12:vr128, debug-location !28; main.c:18:6
  %14:vr128 = PORrr %13:vr128(tied-def 0), killed %11:vr128, debug-location !28; main.c:18:6

Using vector types of two double fp values, X86 ends up using the 128-bit SSE xxm registers, which are spilled in the stack using 16 bytes:

...
  501062:   andpd  xmm0,xmm1
  501066:   andnpd xmm1,xmm2
  50106a:   orpd   xmm1,xmm0
  50106e:   movapd XMMWORD PTR [rbp-0x20],xmm1
...

While AArch64 uses scalar 64-bit fp operands:

...
  501060:       fcsel   d0, d0, d1, eq
  501064:       str d0, [sp, #0x10]
...