trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 565 forks source link

Kokkos: build error on SDSC #1409

Closed kvmkrao closed 7 years ago

kvmkrao commented 7 years ago

@kokkos @crtrott @wfspotz I would like to install Trilinos with CUDA on one of the supercomputer (http://www.sdsc.edu/) and test the performance of linear solvers in Belos for multiphase flows. Therefore, I loaded gcc/4.9.2, MVAPICH2/2.1 and cuda/7.5 modules on sdsc and ran the do-config.sh script (please see the attachment). The build was successful (please see the attachment). However, there are two errors during the build:

_[ 0%] Building CXX object packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace.cpp.o

In file included from /oasis/scratch/comet/vkotteda/temp_project/Software/trilinos_3rd/tpl/trilinos/Trilinos12p8/packages/kokkos/core/src/impl/KokkosHostSpace.cpp:97:0:

_/oasis/scratch/comet/vkotteda/temp_project/Software/trilinos_3rd/tpl/trilinos/Trilinos12p8/packages/kokkos/core/src/KokkosAtomic.hpp:129:1: error: 'device' does not name a type device inline # ^

_/oasis/scratch/comet/vkotteda/temp_project/Software/trilinos_3rd/tpl/trilinos/Trilinos12p8/packages/kokkos/core/src/KokkosAtomic.hpp:138:1: error: 'device' does not name a type device inline

_^ make[2]: [packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_HostSpace.cpp.o] Error 1 make[1]: [packages/kokkos/core/src/CMakeFiles/kokkoscore.dir/all] Error 2_

Do you have a suggestion to get of those errors and install Trilinos with CUDA on SDSC.

Thank you.

config.txt

do_config.txt

ibaned commented 7 years ago

I don't think OMPI_CXX works with MVAPICH.

Better remove OMPI_CXX and do the following instead:

export NVCC_WRAPPER_DEFAULT_COMPILER=mpicxx
...
      -D CMAKE_CXX_COMPILER="/oasis/scratch/comet/vkotteda//Software/trilinos_3rd/kokkos/install/bin/nvcc_wrapper" \

Also make sure you include:

      -D Kokkos_ENABLE_Cuda:BOOL=ON \

PS: the @ symbol in GitHub comments only works for people, teams, and organizations. cuda happens to be a user in Istanbul, so that mention probably wasn't your intent. The other three worked.

nmhamster commented 7 years ago

@ibaned @kvmkrao - for MPICH-based systems/MPI please use the MPICH_CXX environment variable instead of OMPI_CXX.

kvmkrao commented 7 years ago

Thanks @ibaned @nmhamster I modified the do_config file based on your suggestions and ran the script. Those corrections helped me to install Trilinos with CUDA on SDSC. Thanks.

However, I fixed the following minor errors during the build by coping the files in mpi/install/include/* to packages/teuchos/core/src/

_/oasis/scratch/comet/vkotteda/Software/trilinos_3rd/tpl/trilinos/Trilinos12p8/packages/teuchos/core/src/Teuchos_Time.hpp:56:17: fatal error: mpi.h: No such file or directory

include "mpi.h"

             ^

compilation terminated.

/oasis/scratch/comet/vkotteda/Software/trilinos_3rd/tpl/trilinos/Trilinos12p8/packages/teuchos/core/src/mpi.h:2221:18: fatal error: mpio.h: No such file or directory

include "mpio.h"

              ^

compilation terminated.

/oasis/scratch/comet/vkotteda/Software/trilinos_3rd/tpl/trilinos/Trilinos12p8/packages/teuchos/core/src/mpi.h:2237:20: fatal error: mpicxx.h: No such file or directory

include "mpicxx.h"

                ^

compilation terminated._ ....

PS: the @ symbol in GitHub comments only works for people, teams, and organizations. cuda happens to be a user in Istanbul, so that mention probably wasn't your intent. The other three worked.

Sorry, that was my mistake. I did not know that a user with name cuda exists on github. I edited my post to remove "@cuda". Thanks

ibaned commented 7 years ago

However, I fixed the following minor errors during the build by coping the files in mpi/install/include/* to packages/teuchos/core/src/

This doesn't sound like the right thing to do. Somehow mpicxx was no longer involved in compiling Teuchos. If you post your new script we can figure out why.

kvmkrao commented 7 years ago

Ok. I attached the do-config file, which helps me successfully install Trilions, with this post. Kindly see it. Thanks

do_config_mod.txt

cponder commented 6 years ago

I'm running into a problem building the latest "trunk" code using MVAPICH2/GDR-2.3.a with GCC 7.3 and CUDA 9.2. A flag is not being passed correctly down the compilation chain, here is how it's being transformed: (1) mpicxx -> nvcc_wrapper: -Wp,-D_FORTIFY_SOURCE=2 (2) nvcc_wrapper -> nvcc: -Xcompiler -pipe,-Wall,-Wp,-D_FORTIFY_SOURCE=2,.... (3) nvcc -> g++: -pipe -Wall -Wp -D_FORTIFY_SOURCE=2 .... So in stage 3 the -Wp and -D_FORTIFY_SOURCE are being passed as separate flags, and the -Wp is treated as missing a parameter instead of the two being joined by a comma as in stage 1. I think this is Trilinos' fault since comma-fied parameters are legal, and problem is the way the Trilinos-build bundles the flags to pass to nvcc. Maybe a fix would be for the nvcc_wrapper to remove the -Wp and pass the -D_FORTIFY_SOURCE without it.

1, what do you all think of this, and #2, have any of you run into the same problem?

I could submit it as a separate issue, but it looks like things worked ok a year ago. Is the -D_FORITIFY_SOURCE=2 something new in MVAPICH2?