pyccel / pyccel-cuda

Cuda extension to pyccel
MIT License
1 stars 0 forks source link

Basic support for Custom Device function. #41

Closed bauom closed 3 months ago

bauom commented 8 months ago

Description:

Implement support for defining and using custom device functions in Pyccel, analogous to Numba's device functions. This feature will allow functions to be executed on the GPU, callable from kernels or other device functions but not from host code.

Implementation Steps:

  1. Introduce @device decorator for device function definition.
  2. Modify compiler to handle device function calls in the AST, translating them to CUDA code.
  3. Ensure device functions compile to CUDA device functions with correct call semantics.

Read more: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=__device__#device

Example Usage:

Pyccel Code:

from pyccel.decorators import kernel, device
from pyccel.internals import cuda

@device
def add_one(x):
    return x + 1

@kernel
def increment_array(an_array):
    tx = cuda.threadIdx(0)
    ty = cuda.blockIdx(0)
    pos = tx + ty * cuda.blockDim(0)
    if pos < an_array.size:
        an_array[pos] = add_one(an_array[pos])

Expected CUDA Code:

__device__ int add_one(int x) {
    return x + 1;
}

__global__ void increment_array(int* an_array, int array_size) {
    int tx = threadIdx.x;
    int ty = blockIdx.x;
    int pos = tx + ty * blockDim.x;
    if (pos < array_size) {
        an_array[pos] = add_one(an_array[pos]);
    }
}