Closed jcosborn closed 11 months ago
@hummingtree can you take a look?
@jcosborn This is due to SM 86, 87 and 89 only allow a maximum number of 1536 (as supposed to 2048) per SM. I will have a PR to fix this. Meanwhile you can disable this part of the code by having -D QUDA_MDW_FUSED_LS_LIST=""
as part of the cmake parameters, which would decrease your compile time by quite a bit I expect.
A STRICT build using sm_86 with MULTIGRID on fails with: Building CUDA object lib/CMakeFiles/quda.dir/dslash_mdw_fused_ls20.cu.o ptxas error : Value of threads per SM for entry _ZN4quda10raw_kernelINS_18mobius_tensor_core17FusedMobiusDslashENS1_14FusedDslashArgIsLi3EL21QudaReconstructType_s8ELi20ELNS19MdwfFusedDslashTypeE4ELi32ELi3ELb0EEELb0EEEvT0 is out of range. .minnctapersm will be ignored