Closed mesonepigreco closed 5 years ago
The syntax did change a bit but the concepts should be the same. Here's a 2D example and their translations using
./bin/occa translate --mode CUDA test.okl
@kernel void test(float *array2d) {
for (int yBlock = 0; yBlock < Y; yBlock += 16; @outer) {
for (int xBlock = 0; xBlock < X; xBlock += 16; @outer) {
for (int j = yBlock; j < (yBlock + 16); ++j; @inner) {
for (int i = xBlock; i < (xBlock + 16); ++i; @inner) {
array2d[j*X + i] = "bla bla bla";
}
}
}
}
}
↓
extern "C" __global__ void _occa_test_0(float *array2d) {
{
int yBlock = 0 + (16 * blockIdx.y);
{
int xBlock = 0 + (16 * blockIdx.x);
{
int j = yBlock + threadIdx.y;
{
int i = xBlock + threadIdx.x;
array2d[j * X + i] = "bla bla bla";
}
}
}
}
}
The @outer
and @inner
can take an optional argument to specify the dimension that for-loop is traversing. By default the dimensions are x, y, and z from inner-most to outer-most @outer
for-loop.
Here's an example switching the x
and y
blocks
@kernel void test(float *array2d) {
for (int yBlock = 0; yBlock < Y; yBlock += 16; @outer(0)) {
for (int xBlock = 0; xBlock < X; xBlock += 16; @outer(1)) {
for (int j = yBlock; j < (yBlock + 16); ++j; @inner(0)) {
for (int i = xBlock; i < (xBlock + 16); ++i; @inner(1)) {
array2d[j*X + i] = "bla bla bla";
}
}
}
}
}
↓
extern "C" __global__ void _occa_test_0(float *array2d) {
{
int yBlock = 0 + (16 * blockIdx.x);
{
int xBlock = 0 + (16 * blockIdx.y);
{
int j = yBlock + threadIdx.x;
{
int i = xBlock + threadIdx.y;
array2d[j * X + i] = "bla bla bla";
}
}
}
}
}
Although the documentation is a bit lacking, here are 2 places to checkout
Thank you a lot, This is exactly what I was looking for.
@mesonepigreco Awesome, glad it helped! Feel free to ask any questions, it'll help target what documentation to update :)
Dear developers, I would like to use a kernel function that exploits the blocks and threads as dim3 vectors (like it is possible in CUDA or OpenCL), so using multiple inner and outer loops. The CUDA syntax is something like:
I saw an old example on these slides where it is used, but the syntax of OKL used there is different from the current one, so I wonder if it is changed. https://www.caam.rice.edu/~mk51/presentations/SIAMPP2016_6.pdf
I could not find any example of it between those provided, it would be great to add one in which this feature is exploited. Can this be done in the current OKL? If so it is, how?