Closed bauom closed 5 months ago
from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
# Thread id in a 1D block
tx = cuda.threadIdx(0)
# Block id in a 1D grid
ty = cuda.blockIdx(0)
# Block width, i.e. number of threads per block
bw = cuda.blockDim(0)
# Compute flattened index inside the array
pos = tx + ty * bw
if pos < an_array.size: # Check array boundaries
an_array[pos] += 1
Your code looks a little problematic to me. cuda
is not defined anywhere. Did you want to treat it as a built-in?
I would have expected code such as:
from numba import cuda
from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
# Thread id in a 1D block
tx = cuda.threadIdx.x
# Block id in a 1D grid
ty = cuda.blockIdx.x
# Block width, i.e. number of threads per block
bw = cuda.blockDim.x
# Compute flattened index inside the array
pos = tx + ty * bw
if pos < an_array.size: # Check array boundaries
an_array[pos] += 1
Would the latter run in pure Python at all ?
from pyccel import cuda
from pyccel.decorators import kernel
@kernel
def increment_by_one(an_array):
# Thread id in a 1D block
tx = cuda.threadIdx(0)
# Block id in a 1D grid
ty = cuda.blockIdx(0)
# Block width, i.e. number of threads per block
bw = cuda.blockDim(0)
# Compute flattened index inside the array
pos = tx + ty * bw
if pos < an_array.size: # Check array boundaries
an_array[pos] += 1
sorry I missed a from pyccel import cuda
as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library that we can later use it with all the libraries.
as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library
In that case shouldn't it be from pyccel.internals import cuda
to match what is done for the other internal libraries?
as the aim in this first step is just to have a cuda Pyccel internal library that is not specific to a library
In that case shouldn't it be
from pyccel.internals import cuda
to match what is done for the other internal libraries?
yeah that would be better 👍 will change it.
This issue aims to add the feature of creating Custom Kernels in the Numba style. below you can find an example of a
kernel
definition which can be called in the code in the following format.increment_by_one[BN, TPB](args)
:BN
: is the number of blocks to be dispatched on the GPU.TPB
: is the number of threads on each block. this can be implemented by checking ifIndexedElement
in thesemantic
stage is aFunctionCall
and replace it in the AST with aKernelCall
node. aKernelCall
can be detected if anIndexedElement
contains aFunctionCall
which is decorated by thekernel
decorator.Numba code:
Pyccel code: