inducer / loopy

A code generator for array-based code on CPUs and GPUs
http://mathema.tician.de/software/loopy
MIT License
573 stars 70 forks source link

Complex function names are no longer being transformed to the expected c names. #851

Closed dham closed 1 month ago

dham commented 1 month ago

This commit: https://github.com/inducer/loopy/commit/64a8a8ccbe74e01836e497a5c49e04b2d2b07c1d makes the following change:

            if dtype.kind == "c" or name in ["real", "imag", "abs"]:

becomes:

            if dtype.kind == "c":

The effect of this is that real(foo) is no longer transformed to creal(foo) in the generated code, which breaks Firedrake.

kaushikcfd commented 1 month ago

Hi David! I just checked the generated code for loopy@main and it looks like it does transform it to creal, etc.

(py311_env) $ pip freeze | grep loopy
-e git+https://github.com/inducer/loopy.git@1506d5476f08a5e8ff0375c60f0e2f463e627bb0#egg=loopy
(py311_env) $ cat hello.py
import loopy as lp
import numpy as np

knl = lp.make_kernel(
        "{[i]: 0<=i<10}",
        """
        y1[i] = abs(x32[i])
        y2[i] = real(x32[i])
        y3[i] = imag(x32[i])
        y4[i] = conj(x32[i])

        y5[i] = abs(x64[i])
        y6[i] = real(x64[i])
        y7[i] = imag(x64[i])
        y8[i] = conj(x64[i])
        """,
        target=lp.CTarget(),
        seq_dependencies=True)

knl = lp.add_dtypes(knl, {"x32": np.complex64, "x64": np.complex128})

print(lp.generate_code_v2(knl).device_code())
(py311_env) $ python hello.py
#include <complex.h>
#include <stdint.h>
#include <stdbool.h>

void loopy_kernel(float complex const *__restrict__ x32, double complex const *__restrict__ x64, float complex *__restrict__ y1, float complex *__restrict__ y2, float complex *__restrict__ y3, float complex *__restrict__ y4, double *__restrict__ y5, double *__restrict__ y6, double *__restrict__ y7, double complex *__restrict__ y8)
{
  for (int32_t i = 0; i <= 9; ++i)
  {
    y1[i] = cabsf(x32[i]);
    y2[i] = crealf(x32[i]);
    y3[i] = cimagf(x32[i]);
    y4[i] = cconjf(x32[i]);
    y5[i] = cabs(x64[i]);
    y6[i] = creal(x64[i]);
    y7[i] = cimag(x64[i]);
    y8[i] = conj(x64[i]);
  }
}
dham commented 1 month ago

OK so somehow in Firedrake that transformation isn't happening and we get linker fails because real is an unknown system. Where in loopy does the c now get added? (I will have to go through and try to work out why it's not happening for us).

kaushikcfd commented 1 month ago

The "c" gets prefixed in the code you pointed at: https://github.com/inducer/loopy/blob/1506d5476f08a5e8ff0375c60f0e2f463e627bb0/loopy/target/c/__init__.py#L550-L552

dham commented 1 month ago

Actually, I can reproduce the issue. If you change the types line to:

knl = lp.add_dtypes(knl, {"x32": np.float32, "x64": np.float64})

then the c is not added. This is a corner case where we're generating code which applies those functions to real values. That's why that special case was added.

kaushikcfd commented 1 month ago

Hi David,

Even in the case of real-valued arguments, we don't need "cabs", we just need call to fabs and cast to double complex. But the generated code doesn't include "math.h" which causes a bunch of compiler warnings. (seeing these compiler warnings is definitely a loopy-bug and I can patch this so that loopy emits include <math.h> for these cases)

A reproducer to show that except the includes, everything looks generally correct to me:

(py311_env) $ cat hello.py
import loopy as lp
import numpy as np

knl = lp.make_kernel(
        "{[i]: 0<=i<5}",
        """
        y1[i] = abs(x64[i])
        """,
        target=lp.ExecutableCTarget(),
        lang_version=(2018, 2))

knl = lp.add_dtypes(knl, {"x64": np.float64,
                          "y1": np.complex128,})

print(lp.generate_code_v2(knl).device_code())

x_ary = np.random.rand(5)

_, (y1_ary,) = knl(x64=x_ary)
print(x_ary)
print(y1_ary)
(py311_env) $ python hello.py
#include <complex.h>
#include <stdint.h>
#include <stdbool.h>

void loopy_kernel(double const *__restrict__ x64, double complex *__restrict__ y1)
{
  for (int32_t i = 0; i <= 4; ++i)
    y1[i] = (double complex) (fabs(x64[i]));
}

/tmp/tmp_loopyx04yoznw/code.c: In function ‘loopy_kernel’:
/tmp/tmp_loopyx04yoznw/code.c:8:31: warning: implicit declaration of function ‘fabs’ [-Wimplicit-function-declaration]
    8 |     y1[i] = (double complex) (fabs(x64[i]));
      |                               ^~~~
/tmp/tmp_loopyx04yoznw/code.c:4:1: note: include ‘<math.h>’ or provide a declaration of ‘fabs’
    3 | #include <stdbool.h>
  +++ |+#include <math.h>
    4 |
/tmp/tmp_loopyx04yoznw/code.c:8:31: warning: incompatible implicit declaration of built-in function ‘fabs’ [-Wbuiltin-declaration-mismatch]
    8 |     y1[i] = (double complex) (fabs(x64[i]));
      |                               ^~~~
/tmp/tmp_loopyx04yoznw/code.c:8:31: note: include ‘<math.h>’ or provide a declaration of ‘fabs’
[0.40470538 0.73333579 0.25243711 0.94475841 0.78098459]
[0.40470538+0.j 0.73333579+0.j 0.25243711+0.j 0.94475841+0.j
 0.78098459+0.j]
dham commented 1 month ago

abs might be a different case because fabs exists. The one that causes issues for us is real, because there is no freal. I accept that it's odd to call real on a real value, but we do it...

kaushikcfd commented 1 month ago

Aah yep! Just tried the earlier reproducer with real (and imag), it fails. Found the MWE! Thanks!

dham commented 1 month ago

Thanks for this, both!