When compiling Grid and running with a single GPU, running e.g. Benchmark_ITT gives the error:
accelerator_barrier(): Cuda error invalid configuration argument
Digging into this, this is due to line 137 of Grid/threads/Accelerator.h
dim3 cu_blocks ((num1+nt-1)/nt,num2,1); \
For reasons I haven't dug deep enough to understand, when running with 1 GPU, then (num1+nt-1)/nt (or in the specific case that fails—called from WilsonKernelsImplementation.h—(sz+nt-1)/nt) gets set to zero, which isn't a valid block count.
Describe the issue:
When compiling Grid and running with a single GPU, running e.g.
Benchmark_ITT
gives the error:Digging into this, this is due to line 137 of
Grid/threads/Accelerator.h
For reasons I haven't dug deep enough to understand, when running with 1 GPU, then
(num1+nt-1)/nt
(or in the specific case that fails—called fromWilsonKernelsImplementation.h
—(sz+nt-1)/nt
) gets set to zero, which isn't a valid block count.As a workaround, changing line 137 to
allows the code to run correctly.
Code example:
Target platform:
Tested on Grace Hopper Arm+H100, Leicester Arm+A100, and AMD Rome + A100 in Swansea.
Configure options: