inducer / loopy

A code generator for array-based code on CPUs and GPUs
http://mathema.tician.de/software/loopy
MIT License
589 stars 73 forks source link

Dropping arguments in inlining #648

Open sv2518 opened 2 years ago

sv2518 commented 2 years ago

Hi. I noticed an issue with the argument passing that has been introduced in https://github.com/inducer/loopy/pull/631 (I believe).

When we inline an inner kernel in a wrapper kernel, where the inner kernels takes many arguments but the code only depends on a subset of it, then in the C code, the inlined kernel only takes a subset of the arguments as parameters. A minimal example can be found here https://gist.github.com/sv2518/d6690f63b6827750fdaed9b01b4d9cea. I noticed that this only happens to CTargets, not for whatever Loo.py's default target is.

This is problematic in Firedrake for the vectorisation where we generate a loopy kernel first, do the inlining and vectorisation transformations and C codegen after that, and then pass the arguments dynamically. Meaning the arguments for the kernel before the transformations must be the same as after it. The MFE I linked above is coming from our test suite, the corresponding test is tests/regression/test_par_loops.py::test_dict_order_parallel.

Also sorry for the noise recently, it's mostly because I am still trying to get the Firedrake vectorisation PR landed.

kaushikcfd commented 2 years ago

Thanks for the report! That's a concerning bug. Arguments must not be dropped in entrypoint kernels. I think this should be an easy fix. Here's a smaller reproducer:

knl = lp.make_kernel(
    "{ : }",
    """
    a[0] = 1
    """,
    [lp.GlobalArg("a,b,c,d,e",
                  shape=(10,),
                  dtype="float64")])
print(lp.generate_code_v2(knl).device_code())

which generates the kernel:

__kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) loopy_kernel(__global double *__restrict__ a)
{
  a[0] = 1.0;
}

Notice how the arguments b,c,d,e were removed.