Closed khuck closed 2 weeks ago
this doesn't get caught unless the -DKokkos_ENABLE_COMPILER_WARNINGS=ON is set at configure time.
What do compiler warnings have to do with the issue? I would rather have expected that you get a Cuda
error.
oops! I meant -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK
. There is a CUDA error, but it doesn't get caught unless this setting is enabled. The cudaGetLastError()
doesn't happen otherwise.
A colleague of mine ran in a similar issue last week.
A colleague of mine ran in a similar issue last week.
@cedricchevalier19 Was the colleague using tuning? or was this this just running a CUDA back end? Thanks!
@vlkale BTW, a quick and dirty fix for this is to change this line: https://github.com/kokkos/kokkos/blob/2d7715239700f50169bc50a96a234b05c28c9a2e/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp#L114 to:
const dim3 block(m_rp.m_tile[0], m_rp.m_tile[1],
((m_rp.m_tile[2] > 64) ? 64 : m_rp.m_tile[2]));
...and maybe there's a better way to interrogate that 64
value, rather than hard-coding it. You can also change the assertions: https://github.com/kokkos/kokkos/blob/2d7715239700f50169bc50a96a234b05c28c9a2e/core/src/Cuda/Kokkos_Cuda_Parallel_MDRange.hpp#L115-L117 to be:
KOKKOS_ASSERT(block.x > 0 && block.x < 1025);
KOKKOS_ASSERT(block.y > 0 && block.y < 1025);
KOKKOS_ASSERT(block.z > 0 && block.z < 65);
...again, with interrogated values not hard-coded limits.
...and while we're adding assertions, you can also add:
KOKKOS_ASSERT(grid.x > 0);
KOKKOS_ASSERT(grid.y > 0 && grid.y < 65536);
KOKKOS_ASSERT(grid.z > 0 && grid.z < 65536);
...before the CudaParallelLaunch()
call, probably with interrogated limits (not hard-coded).
@vlkale BTW, a quick and dirty fix for this is to change this line:
to:
const dim3 block(m_rp.m_tile[0], m_rp.m_tile[1], ((m_rp.m_tile[2] > 64) ? 64 : m_rp.m_tile[2]));
...and maybe there's a better way to interrogate that
64
value, rather than hard-coding it. You can also change the assertions:to be:
KOKKOS_ASSERT(block.x > 0 && block.x < 1025); KOKKOS_ASSERT(block.y > 0 && block.y < 1025); KOKKOS_ASSERT(block.z > 0 && block.z < 65);
...again, with interrogated values not hard-coded limits.
I am good with this quick fix. Yes, we definitely want the interrogated values.
@khuck - might you be able to raise a PR addressing this issue?
@ajpowelsnl is there a kokkos call equivalent to const auto maxblocks = m_rp.space().cuda_device_prop().maxBlockSize;
? I cannot find in the kokkos source code where m_rp.space().cuda_device_prop().maxGridSize
is defined ...
Hi @khuck - in Kokkos develop
, is this definition helpful?
never mind, I found https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaDeviceProp.html#structcudaDeviceProp_192d195493a9d36b2d827aaf3ffd89f1e
Have a look here 😄
@ajpowelsnl OK, I put together a commit here - is this more or less what you have in mind? I added a lot of assertions, and I have tested it with 2-6 dimensions. https://github.com/kokkos/kokkos/compare/master...khuck:kokkos:master
If that looks OK, I'll make a PR
Describe the bug
The Kokkos internals tuning for the Cuda back-end has a constructor method for declaring the ranges of potential values for the block.x,y,z dimensions of deep copies. The sets of values are [1,2,4,8,16,32,64,128,256,512,1024] for all three dimensions. They are set here:
https://github.com/kokkos/kokkos/blob/2d7715239700f50169bc50a96a234b05c28c9a2e/core/src/Kokkos_Tuners.hpp#L565-L567
Unfortunately, the Z range should only go up to 64, see: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications-technical-specifications-per-compute-capability
This gets exposed by the deep copy of a 3D view.
Please include the following for a minimal reproducer
...or any other tuner that will search into the range where 128-1024 could be used for the block Z dimension.
KokkosCore_config.h
header file (generated during the build)-DKokkos_ENABLE_COMPILER_WARNINGS=ON-DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=ON
is set at configure time.