Closed dutta-alankar closed 1 year ago
Hi @dutta-alankar , welcome to Idefix ! Can you specify the machine you are trying to compile on ? It is possible that Kokkos isn't properly detecting your target architecture, in which case the solution would be to enable it explicitly at configuration (using ccmake).
Hi @dutta-alankar , it's usually not a good idea to specify a compiler when using cuda since Kokkos internally assumes nvcc. I would therefore suggest to remove the -DCMAKE_CXX_COMPILER option.
If you do want to use g++ as the host compiler, then you should set the CXX environnement variable to g++ so that nvcc knows which host compiler it should use.
@glesur and @neutrinoceros
Thanks for all the suggestions!
I tried initially with just cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON
and ran into similar problems. So that is the reason I was trying to set a compiler. I also tried nvcc
and kokkos/bin/nvcc_wrapper
but no luck there.
I was initially using cuda 12.0
and got the problem that I described earlier. Now I switched to cuda 11.6
and using cmake with just cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON
, I get a different error as follows during make
:
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
make[2]: *** [build/kokkos/core/src/CMakeFiles/kokkoscore.dir/build.make:76: build/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1171: build/kokkos/core/src/CMakeFiles/kokkoscore.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
My system:
CXX compiler: GCC 11.3.0
Cuda: 11.6
or 12.0
GPU: RTX 3090
or AMPERE_86
Also the line default_arch="sm_35"
didn't require any change this time using cuda 11.6
for cmake
to succeed unlike cuda 12.0
.
I'm summarizing my findings here. In all cases I'm using cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON
and if it succeeds, I'm using make -j$(nproc)
.
cuda 12.0
cmake
fails with the following error and cannot create Makefile
-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting default Kokkos CXX standard to 17
-- Setting policy CMP0074 to use <Package>_ROOT variables
-- The project name is: Kokkos
nvcc fatal : Value 'sm_35' is not defined for option 'gpu-architecture'
CMake Error at src/kokkos/cmake/kokkos_compiler_id.cmake:12 (STRING):
STRING sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
src/kokkos/cmake/kokkos_compiler_id.cmake:45 (kokkos_internal_have_compiler_nvcc)
src/kokkos/cmake/kokkos_tribits.cmake:204 (INCLUDE)
src/kokkos/CMakeLists.txt:170 (KOKKOS_SETUP_BUILD_ENVIRONMENT)
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt -- Using -std=c++17 for C++17 standard as feature CMake Error at src/kokkos/cmake/kokkos_test_cxx_std.cmake:132 (MESSAGE): Invalid compiler for CUDA. The compiler must be nvcc_wrapper or Clang or use kokkos_launch_compiler, but compiler ID was GNU Call Stack (most recent call first): src/kokkos/cmake/kokkos_tribits.cmake:231 (INCLUDE) src/kokkos/CMakeLists.txt:170 (KOKKOS_SETUP_BUILD_ENVIRONMENT)
-- Configuring incomplete, errors occurred!
- I changed line 15 in `src/kokkos/bin/nvcc_wrapper` from `default_arch="sm_35"` to `default_arch="compute_86"`. After this `cmake` succeeds to create the Makefile. But the subsequent `make` command gives the following error:
[ 1%] Building CXX object build/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o [ 2%] Building CXX object build/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o /Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(936): error: no suitable user-defined conversion from "Kokkos::Experimental::half_t" to "__half" exists
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(941): error: identifier "__half2double" is undefined
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(946): error: no suitable user-defined conversion from "Kokkos::Experimental::half_t" to "__half" exists
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(952): error: no suitable user-defined conversion from "Kokkos::Experimental::half_t" to "__half" exists
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(957): error: no suitable user-defined conversion from "Kokkos::Experimental::half_t" to "__half" exists
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(962): error: no suitable user-defined conversion from "Kokkos::Experimental::half_t" to "__half" exists
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(967): error: no suitable user-defined conversion from "Kokkos::Experimental::half_t" to "__half" exists
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(973): error: no suitable user-defined conversion from "Kokkos::Experimental::half_t" to "__half" exists
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(76): warning #821-D: extern inline function "Kokkos::Experimental::cast_to_half(bool)" was referenced but not defined
Remark: The warnings can be suppressed with "-diag-suppress
/Data/alankar/work/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Half.hpp(101): warning #821-D: extern inline function "Kokkos::Experimental::cast_from_half
8 errors detected in the compilation of "/Data/alankar/work/idefix/src/kokkos/core/src/impl/Kokkos_Core.cpp". make[2]: [build/kokkos/core/src/CMakeFiles/kokkoscore.dir/build.make:90: build/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o] Error 2 make[1]: [CMakeFiles/Makefile2:1171: build/kokkos/core/src/CMakeFiles/kokkoscore.dir/all] Error 2 make: *** [Makefile:136: all] Error 2
- Using `cuda 11.6`, `cmake` can create `Makefile` with no modifications to `nvcc_wrapper`. However `make` fails with a different error message as follows:
[ 1%] Building CXX object build/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’: 435 | function(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’: 530 | operator=(_Functor&& f) | ^ /usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’ make[2]: [build/kokkos/core/src/CMakeFiles/kokkoscore.dir/build.make:76: build/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_CPUDiscovery.cpp.o] Error 1 make[1]: [CMakeFiles/Makefile2:1171: build/kokkos/core/src/CMakeFiles/kokkoscore.dir/all] Error 2 make: *** [Makefile:136: all] Error 2
Thanks for your feedback @dutta-alankar . For cuda 12: it's a known incompatibility of the kokkos version that is pinned by Idefix. You can try to force update it by going to src/kokkos and git pull master. You should then get Kokkos 4.0 which is cuda 12 compatible.
For cuda 11.6: I still have to reproduce it. Do you have the nvidia hpc compiler suite installed ?
I have now tested cuda 11.6
and seems like the GNU Compiler 11.3.0
is incompatible and gives the error I mentioned earlier. I locally compiled GNU Compiler 9.5.0
and installed it at a non-standard location and updated my PATH
and LD_LIBRARY_PATH
in .bashrc
and which g++
seems to give the 9.5.0
version but cmake
was still detecting 11.3.0
. So I ran cmake
using the following:
CXX=g++ cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON
After which, in the Makefile generation I found that the compiler 9.5.0
was getting detected. Now make went smoothly till 100%
and then I landed in another error as follows:
nvlink fatal : Could not open input file '/usr/lib/x86_64-linux-gnu/libdl.a'
make[2]: *** [CMakeFiles/idefix.dir/build.make:790: idefix] Error 1
make[1]: *** [CMakeFiles/Makefile2:417: CMakeFiles/idefix.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
After searching a bit this now seemed to be an issue with cuda 11.6
but fixed in later releases of Cuda. Although, couldn't figure out the reason!
See here:
After this, I changed to cuda 11.7
and only then was I able to compile idefix
. Therefore the takeaway was GCC 9.5.0
and Cuda 11.7
was a working combination.
@glesur Let me know if you can reproduce any of these issues or if you know the valid combination of cuda and compiler versions that are known to work.
I know that cuda 11.6 and gcc 9 works, but I've never tried with gcc 11. I'll double check this, thanks.
@glesur Thanks for your input. I updated Kokkos using git pull origin master
inside src/kokkos
and now Cuda 12.0
and GCC 11.3.0
is able to compile idefix
.
UPDATE: Upto GCC 12.2.0
works with Cuda 12.0
with the updated Kokkos
and idefix
is getting compiled. GCC 13.x
isn't working.
To conclude on using cuda 11.6, the issue you encountered is a known incompatibility between gcc11 and cuda 11.6, which was fixed in cuda 11.6.2. Possible workarounds are using gcc12 or cuda 11.6.2.
Not sure if gcc12
and cuda 11.6.2
are a compatible combination.
https://gist.github.com/ax3l/9489132
Anyways, I have been able to get this working with Kokkos 4
, Cuda 12.0
and GCC 12.2.0
and also using the detached head of kokkos @ a1d045d
with Cuda 11.8
and GCC 9.5.0
.
This GitHub gist is perhaps an useful thing to refer to for anyone new wanting to compile and run idefix
on GPU.
@glesur Thanks again for your help. You may close this issue as it seems to addressed.
I'm trying to compile
Idefix
with the following cmake command for the Sod problem. Here's what I use:and this leads me to the following error:
I was able to get around this by changing
src/kokkos/bin/nvcc_wrapper
line 15 fromdefault_arch="sm_35"
todefault_arch="compute_86"
. After thiscmake
succeeds to create the Makefile. But the subsequentmake
command gives the following error. Can you suggest a fix for this? I'm a newbie as far as using Idefix or Kokkos is concerned.