inducer / loopy

A code generator for array-based code on CPUs and GPUs
http://mathema.tician.de/software/loopy
MIT License
588 stars 73 forks source link

lp.precompute does not emit the correct `precompute_inames` #567

Open kaushikcfd opened 2 years ago

kaushikcfd commented 2 years ago

MWE:

import loopy as lp

knl = lp.make_kernel(
    [
        "{[iel, iface, idof, ifacedof]:"
        " 0<=iface<4 and 0 <=iel<N_e and 0<=idof<20 and 0<=ifacedof<10}"],
    """
    subst(f, e, j) := J[f, e] * u[e, j]
    out[iel, idof] = sum([iface, ifacedof],\
                         subst(iface, iel, ifacedof) * R[iface, idof, ifacedof])
    """)

knl = lp.split_iname(knl, "iel", 8, inner_tag="g.0", outer_tag="l.1")

knl = lp.precompute(knl, "subst",
                    sweep_inames=["ifacedof", "iel_inner"],
                    precompute_inames=["i0", "i1"],
                    precompute_outer_inames=frozenset(["iel_outer", "iface"]),
                    default_tag=None)
print(knl)

generates the following kernel:

---------------------------------------------------------------------------
INSTRUCTIONS:
  for iel_outer, iface, i1, j
↱         subst_0[i1, j] = J[iface, i1 + 8*iel_outer]*u[i1 + 8*iel_outer, j]  {id=subst}
│   end iface, i1, j
│   for idof, iel_inner
└       out[iel_inner + iel_outer*8, idof] = reduce(sum, [iface, ifacedof], subst_0[iel_inner, ifacedof]*R[iface, idof, ifacedof])  {id=insn}
  end iel_outer, idof, iel_inner
---------------------------------------------------------------------------

Notice how it is subst_0[i1, j] and not subst_0[i0, i1].

kaushikcfd commented 2 years ago

I was able to do

knl = lp.precompute(knl, "subst",
                    sweep_inames=["ifacedof", "iel_inner"],
                    storage_axes=["e", "j"],  # Had to add this argument
                    precompute_inames=["i0", "i1"],
                    precompute_outer_inames=frozenset(["iel_outer", "iface"]),
                    default_tag=None)

and get the desired precompute_inames. I think this behavior is unclear (and undocumented) i.e. should the precompute_inames be always accompanies with storage_axes, if so my earlier approach should have seen an exception/warning.