systems-nuts / unifico

Compiler and build harness for heterogeneous-ISA binaries with the same stack layout.
3 stars 1 forks source link

Different inlining of `sqrt` #312

Closed blackgeorge-boom closed 9 months ago

blackgeorge-boom commented 9 months ago
int main(int argc, char *argv[])
{
    int i, j, it;

    double rnorm;
    double norm_temp2;

    lastrow  = NA-1;
    lastcol  = NA-1;

    naa = NA;
    nzz = NZ;

    makea(naa, nzz, a, colidx, rowstr,
          firstrow, lastrow, firstcol, lastcol,
          arow,
          (int (*)[NONZER+1])(void*)acol,
          (double (*)[NONZER+1])(void*)aelt,
          iv);

    for (i = 0; i < NA+1; i++) {
        x[i] = 1.0;
    }

    for (it = 1; it <= 1; it++) {
        conj_grad(colidx, rowstr, x, z, a, p, q, r, &rnorm);

        norm_temp2 = 1.0 / sqrt(norm_temp2);

        for (j = 0; j < lastcol - firstcol + 1; j++) {
            x[j] = norm_temp2 * z[j];
        }
    }

    return 0;
}
blackgeorge-boom commented 9 months ago

AArch64 after partially inlining library calls:

for.body3:                                        ; preds = %for.body
...
  %call = tail call double @sqrt(double undef) #8, !dbg !110
  %2 = fcmp ord double %call, %call, !dbg !111
  br i1 %2, label %for.body3.split, label %call.sqrt, !dbg !111

call.sqrt:                                        ; preds = %for.body3
  %3 = tail call double @sqrt(double undef) #9, !dbg !111
  br label %for.body3.split, !dbg !111

X86

for.body3:                                        ; preds = %for.body
...
  %call = tail call double @sqrt(double undef) #8, !dbg !110
  br i1 false, label %for.body3.split, label %call.sqrt, !dbg !111

call.sqrt:                                        ; preds = %for.body3
  %2 = tail call double @sqrt(double undef) #9, !dbg !111
  br label %for.body3.split, !dbg !111

X86 creates an unconditional branch, which is then merged back to for.body3 creating differences in liveness.