Closed balay closed 1 year ago
@tamiko FYI
For now disabling this requirement (of kokkos+wrapper for gcc) in kokkos/pacakge.py - this build goes through fine for me with gcc - so I don't know why this code exists in spack..
For Kokkos
to use Cuda
all of it (and all downstream compilation units that use Kokkos
headers) need to be compiled with a CUDA
-able compiler, not just files with specific extensions or so. Thus, for using nvcc
as CUDA
compiler, Kokkos
' nvcc_wrapper
script must be used as host compiler. Clang
can compile Cuda
code natively so the wrapper is not necessary.
On the other hand, Kokkos
has a mechanism in place to direct all compilation to nvcc_wrapper
internally so that the host compiler specified in CMake
can be arbitrary. This also holds for all downstream code that uses Kokkos
via CMake
and target_link_libraries
.
For Kokkos to use Cuda all of it (and all downstream compilation units that use Kokkos headers) need to be compiled with a CUDA-able compiler, not just files with specific extensions or so.
Right now this is basically is compiling all .c sources with nvcc (via mpicc) - breaking builds. Check slepc errors at https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/3193169610
>> 605 /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.
0/cuda-11.6.0-tf6htqx3zi5j32km2bq6jdi44tzedbbb/include/cuda_bf16.hp
p(373): error: calling a __device__ function("__float_as_uint") fro
m a __host__ __device__ function("__internal_float2bfloat16") is no
t allowed
etc..
636 nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37'
architectures are deprecated, and may be removed in a future relea
se (Use -Wno-deprecated-gpu-targets to suppress warning).
637 <command-line>: warning: "__CUDA_ARCH_LIST__" redefined
638 <command-line>: note: this is the location of the previous definiti
on
>> 639 /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.
0/cuda-11.6.0-tf6htqx3zi5j32km2bq6jdi44tzedbbb/include/cuda_bf16.hp
p(373): error: calling a __device__ function("__float_as_uint") fro
m a __host__ __device__ function("__internal_float2bfloat16") is no
t allowed
And PETSc/kokkos code doesn't need these wrappers [they break PETSc build similarly]
For now disabling this requirement (of kokkos+wrapper for gcc) in kokkos/pacakge.py - this build goes through fine for me with gcc
Ah - I thought this worked for me [but I guess I must have some bug in my testing process]
https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/3193963276
kokkos@3.6.00~aggressive_vectorization~compiler_warnings+cuda~cuda_constexpr+cuda_lambda~cuda_ldg_intrinsic~cuda_relocatable_device_code~cuda_uvm~debug~debug_bounds_check~debug_dualview_modify_check~deprecated_code~examples~explicit_instantiation~hpx~hpx_async_dispatch~hwloc~ipo~memkind~numactl~openmp~openmptarget~pic+profiling~profiling_load_print~pthread~qthread~rocm+serial+shared~sycl~tests~tuning~wrapper build_type=RelWithDebInfo cuda_arch=70 intel_gpu_arch=none std=14
i.e kokkos~wrapper is used in this build. petsc/slepc build fine here. But sundials is failing.
265 -- Finding PETSC using PETSC_DIR
266 -- Recognized PETSC install with single library for all packages
267 -- PETSC could not be used, maybe the install is broken.
>> 268 CMake Error at /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-
x86_64/gcc-7.3.0/cmake-3.22.2-rdvpr5odvqzoanneyoq5u4qqufsnfof4/shar
e/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (messa
ge):
269 PETSC could not be found. (missing: PETSC_EXECUTABLE_RUNS) (foun
d version
270 "3.18.0")
271 Call Stack (most recent call first):
272 /nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.
3.0/cmake-3.22.2-rdvpr5odvqzoanneyoq5u4qqufsnfof4/share/cmake-3.22/
Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MES
SAGE)
273 cmake/tpl/FindPETSC.cmake:738 (find_package_handle_standard_args)
274 cmake/tpl/SundialsPETSC.cmake:52 (find_package)
perhaps @balos1 can take a look
Right now this is basically is compiling all .c sources with nvcc (via mpicc) - breaking builds. Check slepc errors at gitlab.com/xsdk-project/spack-xsdk/-/jobs/3193169610
Would you have the compile line causing that error?
Would you have the compile line causing that error?
Attaching logs with spack install -j1
. I has previously noticed warnings after mpicc. Now they are after mpif90. [but the error is with .cu sources]. I'm totally confused now.
spack-build-env.txt spack-build-out.txt
using kokkos~wrapper - or resetting MPICH_CXX
back to native compiler does get the build working though...
Hmm... it seems that Kokkos
flags are not properly propagated. In particular, I don't see a flag for the GPU
architecture for the failing compilation units.
I have a fix for sundials that gets past the error above reported by @balay. However, I now get a different error if I do not disable the trilinos variant of sundials when petsc+kokkos
. There seems to be a clash between the internal trilinos kokkos and the standalone.
https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/3198712537
xsdk~cuda
builds fine. This includes petsc+kokkos
and trilinos+kokkos
,
So the failure come up when +cuda
is used.
BTW: Does xsdk+cuda~trilinos
build go through? [good to know - but won't help with the primary issue].
In the last release cycle we tried to add in superlu_dist+cuda
- and that triggered issues with many packages. That issue is likely still pending...
BTW: Does xsdk+cuda~trilinos build go through? [good to know - but won't help with the primary issue].
Well the sundials build does go through without requiring additional fixes - but the dealii build fails
Ref: balay@xsdk:/data/balay/spack.x>nice ./bin/spack install -j24 xsdk@0.8.0+cuda~trilinos cuda_arch=70 ^cuda@11.6.0 ^openmpi |& tee spack-build.log
I have a fix for sundials that gets past the error above reported by @balay. However, I now get a different error if I do not disable the trilinos variant of sundials when
petsc+kokkos
. There seems to be a clash between the internal trilinos kokkos and the standalone.
I would not be surprised that using Trilinos
and an external Kokkos
at the same type is problematic. Trilinos
can't use an external Kokkos
yet and always bundles it. Thus, this case results in two competing Kokkos
installations.
Linking to kokkos in addition to petsc when petsc+kokkos
fixed the second problem and it now builds fine. @balay I think that means we can close this now, yes?
Hm - should we enable petsc+kokkos
in xsdk and try again?
Sure.
( using current kokkos mode of forcing kokkos+wrappers) the build now breaks with MFEM and DEALII
so can't really enable petsc+kokkos
dealii-spack-build-out.txt mfem-spack-build-out.txt
cc: @bangerth @v-dobrev
Yeah, best keep it unspecified for now. Likely, mfem and dealii would have to do the same thing we did in sundials link to kokkos/kokkos-kernels directly.
Also tried kokkos~wrappers
, dealii buids now. mfem still breaks.
From the above log, it looks like petsc.so
has unresolved symbols, e.g.:
/data/balay/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/petsc-3.18.1-eocg6m7e4s4p25pa5oykupapl4c2skiy/lib/libpetsc.so: undefined reference to `KokkosBlas::Impl::Nrm2<Kokkos::View<double, Kokkos::LayoutLeft, Kokkos::HostSpace, Kokkos::MemoryTraits<1u> >, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<1u> >, 1, false, true>::nrm2(Kokkos::View<double, Kokkos::LayoutLeft, Kokkos::HostSpace, Kokkos::MemoryTraits<1u> > const&, Kokkos::View<double const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<1u> > const&, bool const&)'
Why is this not resolved in petsc.so
? Is it in a static library?
In the other log file (mfem-spack-build-out.txt
) there's big mess with errors like this:
/nfs/apps/spacks/2022-02-10/opt/spack/linux-centos7-x86_64/gcc-7.3.0/gcc-9.2.0-llib7puyqxdfte5dd2mw33v7d6mjarrw/lib/gcc/x86_64-pc-linux-gnu/9.2.0/include/stddef.h(426): error: invalid redeclaration of type name "max_align_t"
(426): here
which are impossible to understand without the sequence of #include
directives that lead to this error.
@balos1, what did you need to do in SUNDIALS to fix this kind of errors? Of course, if you had similar errors.
Why is this not resolved in petsc.so? Is it in a static library?
balay@xsdk:/data/balay/spack>ldd /data/balay/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/petsc-3.18.1-eocg6m7e4s4p25pa5oykupapl4c2skiy/lib/libpetsc.so |grep kokkos
libkokkoskernels.so => /data/balay/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/kokkos-kernels-3.7.00-4rxlnggjo5imxzvrbjqzr2xlvml667bz/lib64/libkokkoskernels.so (0x00007fc81d9d5000)
libkokkoscontainers.so.3.7 => /data/balay/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/kokkos-3.7.00-dvfdqghzez7c4iievdvirjk4fjqiid2h/lib64/libkokkoscontainers.so.3.7 (0x00007fc81d7c0000)
libkokkoscore.so.3.7 => /data/balay/spack/opt/spack/linux-centos7-cascadelake/gcc-9.2.0/kokkos-3.7.00-dvfdqghzez7c4iievdvirjk4fjqiid2h/lib64/libkokkoscore.so.3.7 (0x00007fc81d413000)
Yet the linker complains.
@balos1 added the following fix to sundials for this issue
https://gitlab.com/xsdk-project/spack-xsdk/-/commit/c524ebbe7a8cdff480a3aa72d4b8e89e730c00a1
In the other log file (mfem-spack-build-out.txt) there's big mess with errors like this:
with kokkos+wrapper
I get similar mess (that I don't understand) with petsc and slepc. Here is the fix I use for slepc (similar for petsc) - basically undo what kokkos+wrapper
does:
Are there any Kokkos symbols in public petsc headers? If yes, then I suppose linking to petsc.so wont resolve those. @v-dobrev I did not go down the rabbit hole to figure out where in petsc the errors were coming from (although I did not get the one about max_align_t
) yet.
petsc exposes kokkos includes to users via petsc public includes - I think its needed for definitions of basic datatypes from kokkos that get used with some petsc (public/api) functions.
cc: @jczhang07
Yes, petsc has some public headers like petscvec_kokkos.hpp
. When kokkos is enabled, they provide functions like getting a Kokkos::View from a petsc vector.
In current petsc makefile system, Kokkos files are supposed to have suffix*.kokkos.cxx
. PETSc will compile them with a so-called Kokkos compiler. .c, .cxx files are compiled with regular C or C++ compilers.
Hm petscvec_kokkos.hpp
is probably not getting included from sundials/mfem - just the basic includes (petscvec.h,petscsnes.h) - With this usesage - kokkos includes shouldn't get exposed to user? [but linker complains...]
With this usesage - kokkos includes shouldn't get exposed to user?
No, they should not.
but linker complains...
What do you mean? If petsc is configured with Kokkos, of course users should link the petsc library with kokkos libraries
What do you mean? If petsc is configured with Kokkos, of course users should link the petsc library with kokkos libraries
Normally when -lpetsc is created by linking in external libraries - only '-lpetsc' is needed at application link time. But with kokkos [only when cuda is enabled?] - we are getting kokkos link errors.
Hm - I'm unable to reproduce this issue with a stand-alone build of petsc+kokkos+cuda, with a petsc example.
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ make ex19.o
mpicc -o ex19.o -c -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 -I/scratch/balay/petsc/include -I/scratch/balay/petsc/arch-kk-cuda/include -I/usr/local/cuda/include `pwd`/ex19.c
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ mpicc -o ex19 ex19.o -Wl,-rpath,/scratch/balay/petsc/arch-kk-cuda/lib -L/scratch/balay/petsc/arch-kk-cuda/lib -lpetsc -lm
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ /usr/local/cuda/bin/nvcc -o ex19 ex19.o -ccbin mpic++ -Xlinker=-rpath,/scratch/balay/petsc/arch-kk-cuda/lib -L/scratch/balay/petsc/arch-kk-cuda/lib -lpetsc -lm
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$
And I'm unable to reproduce this with the spack build of PETSc (using petsc example).
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ make PETSC_DIR=/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/petsc-3.18.1-ytnw5gw575kfb4zzl7ixmz5nbzm47ak6 ex19.o
/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/mpich-4.0.2-dtswwqtovrk5ogkporfb47wifyizzt74/bin/mpicc -o ex19.o -c -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g -O -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/petsc-3.18.1-ytnw5gw575kfb4zzl7ixmz5nbzm47ak6/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/hypre-2.26.0-ureawcg2si5afftqcxqkj7jgly37gwha/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/superlu-dist-8.1.2-3u2effyw5qflel5ducqvrh43sfqq5ivl/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/kokkos-kernels-3.7.00-4ht6yomef4pf7t3sivljcnblfqjvufmb/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/kokkos-3.7.00-iwdfnaxuphlns375qhkkwcmwrk6nst55/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/hdf5-1.12.2-77mvmjjkujmq6tpl4ec2mrzrb77ue7sn/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/parmetis-4.0.3-inj6jvej57u72pypma5a5zmd4usy4n4t/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/metis-5.1.0-64osc7x3dyrov4wejoayqrktqkdavwdt/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/zlib-1.2.13-a46gganu6rrg7kcrvfle4eext3lu4wt7/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/cuda-11.8.0-6bqbc3g2cfdxvhvi6pxoedbytj5yz2md/include `pwd`/ex19.c
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/mpich-4.0.2-dtswwqtovrk5ogkporfb47wifyizzt74/bin/mpicc -o ex19 ex19.o -Wl,-rpath,/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/petsc-3.18.1-ytnw5gw575kfb4zzl7ixmz5nbzm47ak6/lib -L/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/petsc-3.18.1-ytnw5gw575kfb4zzl7ixmz5nbzm47ak6/lib -lpetsc -lm
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/cuda-11.8.0-6bqbc3g2cfdxvhvi6pxoedbytj5yz2md/bin/nvcc ex19.c -o ex19 -O3 -std=c++14 -x=cu --expt-extended-lambda -arch=sm_80 -ccbin /scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/mpich-4.0.2-dtswwqtovrk5ogkporfb47wifyizzt74/bin/mpic++ -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/petsc-3.18.1-ytnw5gw575kfb4zzl7ixmz5nbzm47ak6/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/hypre-2.26.0-ureawcg2si5afftqcxqkj7jgly37gwha/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/superlu-dist-8.1.2-3u2effyw5qflel5ducqvrh43sfqq5ivl/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/kokkos-kernels-3.7.00-4ht6yomef4pf7t3sivljcnblfqjvufmb/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/kokkos-3.7.00-iwdfnaxuphlns375qhkkwcmwrk6nst55/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/hdf5-1.12.2-77mvmjjkujmq6tpl4ec2mrzrb77ue7sn/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/parmetis-4.0.3-inj6jvej57u72pypma5a5zmd4usy4n4t/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/metis-5.1.0-64osc7x3dyrov4wejoayqrktqkdavwdt/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/zlib-1.2.13-a46gganu6rrg7kcrvfle4eext3lu4wt7/include -I/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/cuda-11.8.0-6bqbc3g2cfdxvhvi6pxoedbytj5yz2md/include -Xlinker=-rpath,/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/petsc-3.18.1-ytnw5gw575kfb4zzl7ixmz5nbzm47ak6/lib -L/scratch/balay/spack/opt/spack/linux-ubuntu22.04-zen3/gcc-11.3.0/petsc-3.18.1-ytnw5gw575kfb4zzl7ixmz5nbzm47ak6/lib -lpetsc -lm
balay@petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$
With this same build - mfem fails (inside spack build)
Does the kokkos library contain the undefined reference in libpetsc.so?
Does the kokkos library contain the undefined reference in libpetsc.so?
It should - the string is way too long to do nm
to verify :(
BTW: Noticed mfem was built as static [default] - tried switching to mfem+shared
and got the same error.
And regular [non-cuda/kokkos] builds appear to fail - filed this at: https://github.com/xsdk-project/xsdk-issues/issues/199
The unresolved symbols seem to be from the namespaces KokkosSparse::Impl
and KokkosBlas::Impl
. Which Kokkos library file(s) contain these symbols? I can try to added these manually in addition to -lpetsc
.
The unresolved symbols seem to be from the namespaces KokkosSparse::Impl and KokkosBlas::Impl. Which Kokkos library file(s) contain these symbols?
They should be kokkos kernel libraries. Might need to query spack to get the correct library names. [perhaps with dependent kokkos library as-well]
balay@petsc-gpu-01:/scratch/balay/petsc/arch-kk-cuda/lib$ ldd libpetsc.so |grep kokkos
libkokkoskernels.so (0x00007f44a1ab5000)
libkokkoscore.so.3.7 (0x00007f44a1810000)
libkokkoscontainers.so.3.7 (0x00007f44a0ebe000)
@balay, I'm trying to reproduce the issue on Lassen using https://github.com/spack/spack/pull/33603 with
./bin/spack install -j 128 --fresh mfem+cuda+petsc cuda_arch=70 ^petsc+cuda+kokkos
which leads to kokkos~wrapper
spec.
However, I get this error from the kokkos package:
==> Error: InstallError: Kokkos requires +wrapper when using +cudawithout clang
...
264 if spec.satisfies("~wrapper+cuda") and not (
265 spec.satisfies("%clang") or spec.satisfies("%cce")
266 ):
>> 267 raise InstallError("Kokkos requires +wrapper when using +cuda" "without clang")
268
269 options = [
270 from_variant("CMAKE_POSITION_INDEPENDENT_CODE", "pic"),
My setup is using gcc@8.3.1
and external cuda@11.5.0
. In the log you posted above (mfem-spack-build-out2.txt
), you also seem to be using gcc
-- how come your kokkos build worked?
I think I figured out the main issue with the kokkos+wrapper
dependency:
nvcc
, removing, changing, and adding arguments and ultimately calling nvcc
.The issue with that in the MFEM package (and probably many other packages) is that it compiles CUDA + MPI by calling nvcc
with -ccbin
set to the MPI wrapper, so we end up with a chain of calls like this: nvcc
-> mpicxx
-> nvcc_wrapper
(adds arguments like -arch=sm_35
) -> nvcc
-> g++
. This is clearly not what we want.
Since many packages compile CUDA + MPI the way MFEM does (by calling nvcc
with -ccbin
set to the MPI wrapper), to me the behavior of the kokkos-nvcc-wrapper
package (which overwrites the compiler for the MPI wrapper to be nvcc_wrapper
) seems unacceptable because every such package now has to undo at least some of the changes that kokkos-nvcc-wrapper
does to the environment -- that is what Satish has already done for PETSc and SLEPc.
how come your kokkos build worked?
I comment out those 3 offending lines when testing the kokkos~wrapper
use case.
The issue with that in the MFEM package (and probably many other packages) is that it compiles CUDA + MPI by calling nvcc with -ccbin set to the MPI wrapper, so we end up with a chain of calls like this: nvcc -> mpicxx -> nvcc_wrapper (adds arguments like -arch=sm_35) -> nvcc -> g++. This is clearly not what we want.
Yes - I think it doesn't belong in kokkos/package.py [i.e it should not force all dependent pkgs to use this modified mpicxx - that breaks compiles. Only pkgs that need it should do this switch. However @masterleinad disagrees...
Since many packages compile CUDA + MPI the way MFEM does (by calling nvcc with -ccbin set to the MPI wrapper), to me the behavior of the kokkos-nvcc-wrapper package (which overwrites the compiler for the MPI wrapper to be nvcc_wrapper) seems unacceptable because every such package now has to undo at least some of the changes that kokkos-nvcc-wrapper does to the environment -- that is what Satish has already done for PETSc and SLEPc.
yes - I undo this mpicxx switch in petsc/slepc for kokkos+wrapper
- and that gets these builds working.
Yes - I think it doesn't belong in kokkos/package.py [i.e it should not force all dependent pkgs to use this modified mpicxx - that breaks compiles. Only pkgs that need it should do this switch. However @masterleinad disagrees...
I don't necessarily disagree. I'm just pointing out that using Kokkos+CUDA
with nvcc
requires special care in the choice of the compiler and I'm honestly surprised that you make it work without using nvcc_wrapper
.
I think it's a good idea to just open a pull request on the spack
side and discuss what to do there.
Okay, I created an issue in Spack: https://github.com/spack/spack/issues/33684.
Until https://github.com/spack/spack/issues/33684 is resolved, I pushed a temporary (?) workaround for the kokkos+wrapper
issue to MFEM in https://github.com/spack/spack/pull/33603.
With that, I think we still have the issue with dealii
failing with kokkos+wrapper
. Do we want to try to fix that (by reverting the environment change from kokkos-nvcc-wrapper
?
Alternaltively, we can try to modify the kokkos
package to allow kokkos~wrapper
with g++
and other non-clang
compilers and then I can try to resolve the issue with MFEM in this case.
Any thoughts?
Regarding MFEM with kokkos~wrapper
: after commenting out these lines in the Kokkos package:
if spec.satisfies("~wrapper+cuda") and not (
spec.satisfies("%clang") or spec.satisfies("%cce")
):
raise InstallError("Kokkos requires +wrapper when using +cuda" "without clang")
I had no issue build the following on Lassen:
./bin/spack install -j 128 --fresh mfem+cuda+petsc cuda_arch=70 ^petsc+cuda+kokkos+mumps ^kokkos~wrapper
For ex: - check https://gitlab.com/xsdk-project/spack-xsdk/-/jobs/3193169610
For now disabling this requirement (of kokkos+wrapper for gcc) in kokkos/pacakge.py - this build goes through fine for me with gcc - so I don't know why this code exists in spack..
And I'm not sure if we can upstream this change
https://gitlab.com/xsdk-project/spack-xsdk/-/commit/b19632e72db0c33bcb83d95323f500a94b428e49
Note: PETSc already has a workaround for
kokkos+wrapper
, I've added this fix for slepc, likely similar fixes might be needed for other pkgs.. (for kokkos+wrapper to work)https://gitlab.com/xsdk-project/spack-xsdk/-/commit/9afe4953d25ceda7ef8a16e7198b820639bc246b