Open jxy opened 2 years ago
Just noting that CUDA 11.7.u1 has now been released. This should fix this issue. https://developer.nvidia.com/cuda-downloads
I still plan to take a look to see if I can add a work around for 11.6.
BTW I'm getting the same error with the default environment in perlmutter. It's using CUDA 11.7.64 and gcc 11.2
11.7.64 is the original release of 11.7 not the updated release. You will need to switch to either 11.8 or update to 11.7u1.
Just noting that CUDA 11.7.u1 has now been released. This should fix this issue. https://developer.nvidia.com/cuda-downloads
Is there any chance that this can be worked around in any way without having to ask centers to upgrade to 11.7u1, 11.8 or newer? EasyBuild has not updated the OpenMPI / UCX-CUDA / CUDA / GCC combo regularly and as a consequence, on many machines that I run on, one is left with basically three choices:
While 11.3 works and this is what I use most (except for 11.5 in the one place), the remaining software stack that 11.3 is made available with is from early 2020, is not maintained and things are beginning to break.
Going via EasyBuild, the only upgrade path that I can see for centers is CUDA 12 and I suspect that many are hesitant to take this step, even if asked nicely by some LQCD users.
Using commit dd6207e6e, cuda 11.6.2, gcc 12.1.0, running on
with command
gives error
It works fine with changing the precision
half
tosingle
in the above command. Recompiling the code with cuda 11.4.0 gcc 9.2.0 also runs fine.