pyccel / pyccel-cuda

Cuda extension to pyccel
MIT License
1 stars 0 forks source link

Wrong handling of imports #48

Closed EmilyBourne closed 6 months ago

EmilyBourne commented 7 months ago

Describe the bug

The main branch imports the header of the generated source file inside a extern 'C' block. This works for functions which don't use cuda functionalities, but cuda objects such as kernels cannot be expressed in C so they shouldn't be declared inside an extern 'C' block.

To Reproduce

Provide code to reproduce the behaviour:

def is_false(a : 'bool'):
    c = False
    if a is False:
        c = True
    return c

Error details

Provide the generated code

#include <stdlib.h>
#include <stdbool.h>
extern "C"{
    #include "tmp_cuda_1.h"
}

/*........................................*/
bool is_false(bool a)
{
    bool c;
    c = 0;
    if (a == 0)
    {
        c = 1;
    }
    return c;
}
/*........................................*/

Expected behaviour

There are two possible ways of handling this.

1. Treat the generated code as pure Cuda code and add a wrapper. The generated cuda code should then be:

tmp.cu:

#include <stdlib.h>
#include <stdbool.h>
#include "tmp_cuda_1.h"

/*........................................*/
bool is_false(bool a)
{
    bool c;
    c = 0;
    if (a == 0)
    {
        c = 1;
    }
    return c;
}
/*........................................*/

tmp.h

#ifndef TMP_CUDA_1_H
#define TMP_CUDA_1_H

#include <stdlib.h>
#include <stdbool.h>

bool is_false(bool a);
#endif // tmp_cuda_1_H

but an additional cuda to c wrapper is required.

2. Mark the C-compatible Cuda code in the header. The generated cuda code should then be:

#include <stdlib.h>
#include <stdbool.h>
#include "tmp_cuda_1.h"

/*........................................*/
bool is_false(bool a)
{
    bool c;
    c = 0;
    if (a == 0)
    {
        c = 1;
    }
    return c;
}
/*........................................*/

tmp.h

#ifndef TMP_CUDA_1_H
#define TMP_CUDA_1_H

#include <stdlib.h>
#include <stdbool.h>

extern "C" {
bool is_false(bool a);
}

#endif // tmp_cuda_1_H

In the C-Python API this .h file shouldn't be imported. Instead (as for Fortran) the declarations should also be printed in the header of the wrapper. I.e.:

#ifndef TMP_CUDA_1_WRAPPER_H
#define TMP_CUDA_1_WRAPPER_H

#include "numpy_version.h"
#include "numpy/arrayobject.h"
#include "cwrapper.h"

#ifdef TMP_CUDA_1_WRAPPER

bool bind_c_is_false(bool);

#else

static void** Pytmp_cuda_1_API;

/*........................................*/
static int tmp_cuda_1_import(void)
{
    PyObject* current_path;
    PyObject* stash_path;
    current_path = PySys_GetObject("path");
    stash_path = PyList_GetItem(current_path, 0);
    Py_INCREF(stash_path);
    PyList_SetItem(current_path, 0, PyUnicode_FromString("/home/emily/Code/test/tmp"));
    Pytmp_cuda_1_API = (void**)PyCapsule_Import("tmp_cuda_1._C_API", 0);
    PyList_SetItem(current_path, 0, stash_path);
    return Pytmp_cuda_1_API != NULL ? 0 : -1;
}
/*........................................*/

#endif
#endif // TMP_CUDA_1_WRAPPER_H

Language

Cuda

Additional context

Found by @smazouz42 in #42

EmilyBourne commented 7 months ago

I prefer option (2) for a few reasons:

  1. It should be faster (no additional function calls)
  2. It avoids an unnecessary wrapper
  3. It is similar to the approach that would be taken in the wrapper anyway
  4. Calling cuda from C is a common use case so this also makes sense for the user code

However, we will have to take care to ensure that the prototypes are fully compatible. A wrapper generation stage may still be necessary to translate e.g. between array representations.

@bauom what do you think?