systems-nuts / unifico

Compiler and build harness for heterogeneous-ISA binaries with the same stack layout.
3 stars 1 forks source link

Constant multiplication optimization in `y_solve` #277

Closed blackgeorge-boom closed 1 year ago

blackgeorge-boom commented 1 year ago
#include "header.h"

#define BLOCK_SIZE 5

int grid_points[3];
double rhs2[10][10 + 1][10 + 1][5];
double lhs[1][3][5][5];

void y_solve()
{
    int i, j, k, m, n, jsize;

    jsize = grid_points[1];

    for (k = 1; k <= grid_points[2] - 2; k++) {
        for (i = 1; i <= grid_points[0] - 2; i++) {
            lhsinit(lhs, jsize);
            for (j = jsize - 1; j >= 0; j--) {
                for (m = 0; m < BLOCK_SIZE; m++) {
                    for (n = 0; n < BLOCK_SIZE; n++) {
                        rhs2[k][j][i][m] =
                                lhs[0][0][n][m] * rhs2[k][j + 1][i][n];
                    }
                }
            }
        }
    }
}

int main(int argc, char *argv[])
{
    y_solve();

    return 0;
}
make clean; make stackmaps-check -j10 OPT_LEVEL=-O0

WARNING: y_solve: callsite 0 has different number of architecture specific live locations (2 vs 1)
WARNING: y_solve, callsite 0: arch-specific live value 0/0 has different location operand offset or constant (40 vs. 440)
ERROR: stackmaps in 'main_aarch64_aligned.out' & 'main_x86_64_aligned.out' differ - different stack layout!
blackgeorge-boom commented 1 year ago
 make stackmaps-dump                                                                                                                                                                                                                                                                   

 [STACKMAP DUMP] main_x86_64_aligned.out
Reading section .llvm_pcn_stackmaps: Found 1 stackmaps
Stackmap v3: 2 functions, 0 constants, 2 call sites
  Function y_solve: address=501020, stack size=72, number of unwinding entries: 4, offset into unwinding section: 0
  Function main: address=501180, stack size=40, number of unwinding entries: 2, offset into unwinding section: 4
  Call site 0: function 0, offset @ a8, address 501228, 7 locations, 0 live-outs, 1 arch-specific locations
    Location: at register 6 + -28, is an alloca of size 4 byte(s)
    Location: at register 6 + -32, is an alloca of size 4 byte(s)
    Location: at register 6 + -48, is an alloca of size 4 byte(s)
    Location: at register 6 + -36, is an alloca of size 4 byte(s)
    Location: at register 6 + -40, is an alloca of size 4 byte(s)
    Location: at register 6 + -44, is an alloca of size 4 byte(s)
    Location: at pointer generated by register 6 + -52
    Arch-specific location: in register 15, Set value = 440 / 0x1b8
  Call site 0: function 1, offset @ 1c, address 50119c, 0 locations, 0 live-outs, 0 arch-specific locations
 [STACKMAP DUMP] main_aarch64_aligned.out
Reading section .llvm_pcn_stackmaps: Found 1 stackmaps
Stackmap v3: 2 functions, 0 constants, 2 call sites
  Function y_solve: address=501020, stack size=80, number of unwinding entries: 4, offset into unwinding section: 0
  Function main: address=501180, stack size=48, number of unwinding entries: 2, offset into unwinding section: 4
  Call site 0: function 0, offset @ a8, address 501228, 7 locations, 0 live-outs, 2 arch-specific locations
    Location: at register 29 + -28, is an alloca of size 4 byte(s)
    Location: at register 29 + -32, is an alloca of size 4 byte(s)
    Location: at register 29 + -48, is an alloca of size 4 byte(s)
    Location: at register 29 + -36, is an alloca of size 4 byte(s)
    Location: at register 29 + -40, is an alloca of size 4 byte(s)
    Location: at register 29 + -44, is an alloca of size 4 byte(s)
    Location: at pointer generated by register 29 + -52
    Arch-specific location: in register 19, Set value = 40 / 0x28
    Arch-specific location: in register 20, Set value = 440 / 0x1b8
  Call site 0: function 1, offset @ 1c, address 50119c, 0 locations, 0 live-outs, 0 arch-specific locations
blackgeorge-boom commented 1 year ago

Probably, X86 optimizes:

t8: i64 = mul t67, Constant:i64<40>

to:

t76: i64 = X86ISD::MUL_IMM t67, Constant:i64<5>
t77: i64 = shl t76, Constant:i8<3>
blackgeorge-boom commented 1 year ago

The simplest thing for now, is to deactivate these kinds of optimizations using -mul-constant-optimization=false (need to benchmark this).