Closed MalachiTimothyPhillips closed 7 months ago
Automatic mention of the @trilinos/muelu team
@trilinos/muelu
@MalachiTimothyPhillips Is this with a recent integration of Trilinos?
@trilinos/muelu
@MalachiTimothyPhillips Is this with a recent integration of Trilinos?
Correct -- this isn't currently preventing Sierra from building, but we are looking to do an integration soon. The last integration we did was November 9, 2013.
@MalachiTimothyPhillips Can you share a configure script?
@MalachiTimothyPhillips Can you share a configure script?
These are the configure flags passed to cmake
:
spack-configure-args-cleaned.txt
Note the full spack paths for 3rd party library/include dirs is abbreviated <spack_path>
. If it helps, I can widdle down the flags to try and get you a minimal reproducer.
@MalachiTimothyPhillips Could you try with these flags?
-D KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON
-D Tpetra_ALLOCATE_IN_SHARED_SPACE=ON
See #12622.
@MalachiTimothyPhillips Could you try with these flags?
-D KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON -D Tpetra_ALLOCATE_IN_SHARED_SPACE=ON
See #12622.
No luck. I've generated a smaller reproducer that is accessible here: https://github.com/MalachiTimothyPhillips/trilinos-issue-12846.
edit: After taking a closer look, it looks like the linker errors in the stub above (and my example) stem from trying to use int
for the global ordinal type.
e.g.:
/sierra/build/linux_rh7/nightly/nvidia_trilinos_develop_master/objs/tpls/spack/spack/__spack_path_placeholder__/__spack_path_pla/linux-rhel7-x86_64/gcc-8.3.0/trilinos-develop-b3pf3gpcojl3qf3u4qymdldxxpldgb6b/lib64/libmuelu.a(ETI_MueLu_AdaptiveSaMLParameterListInterpreter.cpp.o): In function `MueLu::HierarchyManager<double, int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaUVMSpace> >::CreateHierarchy(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const':
tmpxft_000388d7_00000000-6_ETI_MueLu_AdaptiveSaMLParameterListInterpreter.cudafe1.cpp:(.text._ZNK5MueLu16HierarchyManagerIdiiN6Tpetra12KokkosCompat23KokkosDeviceWrapperNodeIN6Kokkos4CudaENS4_12CudaUVMSpaceEEEE15CreateHierarchyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_ZNK5MueLu16HierarchyManagerIdiiN6Tpetra12KokkosCompat23KokkosDeviceWrapperNodeIN6Kokkos4CudaENS4_12CudaUVMSpaceEEEE15CreateHierarchyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0x1e): undefined reference to `MueLu::Hierarchy<double, int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaUVMSpace> >::Hierarchy(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/sierra/build/linux_rh7/nightly/nvidia_trilinos_develop_master/objs/tpls/spack/spack/__spack_path_placeholder__/__spack_path_pla/linux-rhel7-x86_64/gcc-8.3.0/trilinos-develop-b3pf3gpcojl3qf3u4qymdldxxpldgb6b/lib64/libmuelu.a(ETI_MueLu_AdaptiveSaMLParameterListInterpreter.cpp.o): In function `MueLu::HierarchyManager<double, int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Cuda, Kokkos::CudaUVMSpace> >::CreateHierarchy() const':
The cmake configuration indicates that the only support global ordinal type is long long
:
-- MueLu: Enabling ETI support
-- <float, int, int> : OFF
-- <float, int, long long> : ON
-- <double, int, int> : OFF
-- <double, int, long> : OFF
-- <double, int, long long> : ON
-- <complex, int, int> : OFF
-- <complex, int, long long> : ON
I am not sure where/why the int
type for the global ordinal is coming from, but I do not yet have a simple reproducer prepared for that yet.
I should also note we are not seeing these linker errors on GCC/Clang, for example.
Setting:
-DTpetra_INST_INT_INT:BOOL=OFF
-DTpetra_INST_INT_LONG:BOOL=ON
I see the following CMake configure output:
-- MueLu: Enabling ETI support
-- <float, int, int> : OFF
-- <float, int, long long> : OFF
-- <double, int, int> : OFF
-- <double, int, long> : ON
-- <double, int, long long> : OFF
-- <complex, int, int> : OFF
-- <complex, int, long long> : OFF
-- MueLu: Default GO: int
That has to be wrong, right? I think this is why I'm hitting the linker errors.
@MalachiTimothyPhillips I'm not sure if this is your issue, but the way the variables work is
Tpetra_INST_LO_GO
So setting Tpetra_INST_INT_LONG
results in LO=int
, GO=long
.
@MalachiTimothyPhillips I'm not sure if this is your issue, but the way the variables work is
Tpetra_INST_LO_GO
So setting
Tpetra_INST_INT_LONG
results inLO=int
,GO=long
.
Right -- why is MueLu configured with Default GO: int
in that case? e.g., https://github.com/trilinos/Trilinos/blob/88b2d6fa20695c34bdc44a9074a6ab42294446df/packages/muelu/CMakeLists.txt#L242
Oh, no I see that. :eyes:
Can you try to change from: https://github.com/trilinos/Trilinos/blob/88b2d6fa20695c34bdc44a9074a6ab42294446df/packages/muelu/CMakeLists.txt#L241-L242 to
IF(Tpetra_INST_DOUBLE AND Tpetra_INST_INT_LONG)
GLOBAL_SET(${PACKAGE_NAME}_HAVE_GO_LONG ON)
? Cause that looks fishy.
NVM, I think something is quite wrong in our CMake for long. Well, actually, it might just be weird.
NVM, I think something is quite wrong in our CMake for long. Well, actually, it might just be weird.
Sure!
https://github.com/trilinos/Trilinos/blob/88b2d6fa20695c34bdc44a9074a6ab42294446df/packages/muelu/CMakeLists.txt#L283, too? Or something else?
Previous Trilinos versions (e.g., de68716d971f08d995548a6bdc37f205dbf2ba50, which is our current version in Sierra) had the same configuration. I'm wondering if there's some change in ETI handling for the Kokkos::CudaUVMSpace
specifically that is allowing use to still link on cpu-only builds and (previously) nvidia builds.
edit: I guess another complication is the fact that Epetra seems to require <double, int, int>
, right?
Nope, only that spot. Yeah, I'm also not sure how this could have worked before.
edit: Yes, if you want Epetra+Tpetra you need int-int.
Good news -- that seems to have fixed the linker errors on our end! See: #12881.
Thank you for your help!
Marking as closed now that #12881 has been merged.
Bug Report
@MueLu @jhux2
Description
I am running into some linker issues in a
nvidia
Sierra spack build of Trilinos. It looks as if MueLu is missing explicit template instantiations forKokkos::CudaUVMSpace
(see linker output below). I will attempt to resolve the issue on my end, but I am not familiar with how MueLu handles the explicit template instantiations.