Open sekoenig opened 6 months ago
Actually, I take back what I wrote about shared libraries: they are being generated even with ETI turned off, it is just that no shared libraries without version suffix are being installed for some reason. After manually creating those symlinks and linking again to all the libraries, the list of undefined references is greatly reduced, only one issue remains (in multiple similar instances):
build/Test.o:Test.cpp:function Tpetra::MultiVector<double, int, long long,
Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace>
>::descriptionImpl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const:
(.text._ZNK6Tpetra11MultiVectorIdixNS_12KokkosCompat23KokkosDeviceWrapperNodeIN6Kokkos6OpenMP
ENS3_9HostSpaceEEEE15descriptionImplERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x184):
error: undefined reference to 'Tpetra::KokkosCompat
::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace>::name[abi:cxx11]()'
Any hints what I might be missing?
After digging a bit, I found that the issue arises from some explicit static definitions within Tpetra. The following patch fixes the problem for me:
diff --git a/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp b/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp
index 1a725fd6b59..b389be5ee36 100644
--- a/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp
+++ b/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp
@@ -1,6 +1,8 @@
#include <Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.hpp>
#include "Teuchos_ParameterList.hpp"
+#ifndef HAVE_TPETRA_EXPLICIT_INSTANTIATION
+
namespace Tpetra {
namespace KokkosCompat {
@@ -54,5 +56,5 @@ namespace KokkosCompat {
} // namespace KokkosCompat
} // namespace Tpetra
-
+#endif
diff --git a/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.hpp b/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.hpp
index ee06186d44e..70510b09f12 100644
--- a/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.hpp
+++ b/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.hpp
@@ -47,7 +47,11 @@ public:
KokkosDeviceWrapperNode () = delete;
//! Human-readable name of this Node.
+#ifndef HAVE_TPETRA_EXPLICIT_INSTANTIATION
+ static std::string name () { return "N/A without ETI"; }
+#else
static std::string name ();
+#endif
};
#ifdef KOKKOS_ENABLE_SYCL
Would it be possible to have this (or some better solution, e.g. using typeid
together with boost::core::demangle
) implemented upstream? I would be happy to create a proper PR if there is interest in merging such a fix.
@trilinos/tpetra
@sekoenig How are you linking your project against Tpetra? I built with ETI off but I do see the OpenMP wrapper node name()
symbol in the library:
[bmkelle@s1042556 compat]$ nm --demangle libtpetraclassic.so.15.2.0 | grep "KokkosDeviceWrapperNode"
0000000000001220 T Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace>::name[abi:cxx11]()
0000000000001260 T Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>::name[abi:cxx11]()
My exact linking setup is admittedly a bit intricate: since a (small) part of my project is written in Haskell, I funnel linking through ghc so that the Haskell runtime library is properly added. I also use MPI (OpenMPI on Debian testing on my development machine), and therefore I add the output of mpicc -showme:link
to the linker command line. Then I add -ltpetra ...
with the ellipses standing for a number of other Trilinos libraries.
Initially I was using shared libraries, as part of tracking this problem I switched to static libs, it didn't change anything.
From your output I suspected that perhaps I need libtpetraclassic, but checking all libtpetra... files now with nm
, I don't see the required symbols anywhere.
I was testing so far with Trilinos 14.4, I suppose I should try the latest version to see if the issue is still there.
Does libtpetraclassic contain any symbols? I looked briefly at the files containing name()
and the CMakeLists and it seems that they shouldn't be affected by Trilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF
.
CC: @rppawlo, @sebrowne
@sekoenig, I believe that Trilinos is no longer supporting Trilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF
? Testing for Trilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF
was disabled a while ago. I think it was just overlooked to error our when users try to set that option.
@brian-kelley There are some symbols in libtpetraclassic.a, but not many:
$ nm --demangle libtpetraclassic.a
Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp.o:
U __cxa_atexit
U __dso_handle
U _GLOBAL_OFFSET_TABLE_
0000000000000000 t _GLOBAL__sub_I_Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp
U operator delete(void*, unsigned long)
0000000000000000 b (anonymous namespace)::local_activeRCPNodesSetup
0000000000000020 b Kokkos::Tools::Experimental::Impl::team_tuners
U Teuchos::ActiveRCPNodesSetup::ActiveRCPNodesSetup()
U Teuchos::ActiveRCPNodesSetup::~ActiveRCPNodesSetup()
0000000000000000 W std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::Tools::Experimental::TeamSizeTuner, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::~map()
0000000000000000 W std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::Tools::Experimental::TeamSizeTuner, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::~map()
0000000000000000 n std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::Tools::Experimental::TeamSizeTuner, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::~map()
0000000000000000 t std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> >*) [clone .isra.0]
U std::ios_base_library_init()
I tested with Trilinos 15.1.0 on another machine with slighlty different configuration (Debian stable, MPICH), and found the same issue there. I will test with the latest github version also.
@bartlettroscoe Do tests being disabled imply there is no support for that anymore? I did not find a note anywhere that this was being deprecated. For what I am doing (successfully, as of today), adding a custom Kokkos memory space, unfortunately it seems I need to disable ETI.
@brian-kelley Following up, I still see the issue with yesterday's master branch from github (HEAD is at bf2943a7cb297), and my fix/workaround also still works.
@sekoenig Could you share the way you're configuring Trilinos? The only thing I can think of is that these name()
function definitions are guarded by the corresponding KOKKOS_ENABLE_<BACKEND>
macros, so if KOKKOS_ENABLE_OpenMP
is not set to ON then that would explain your undefined reference error.
@brian-kelley Of course, here is how I call cmake:
cmake -D TPL_ENABLE_MPI=ON -DMPI_BASE_DIR=/usr \
-DCMAKE_CXX_STANDARD=17 \
-DBLAS_LIBRARY_NAMES="openblas" \
-DLAPACK_LIBRARY_NAMES="openblas" \
-DTrilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF \
-DTrilinos_ENABLE_OpenMP=ON \
-DTrilinos_ENABLE_Belos=ON \
-DTrilinos_ENABLE_COMPLEX=ON \
-DTeuchos_ENABLE_THREAD_SAFE=ON \
-DTrilinos_ENABLE_Kokkos=ON \
-DTrilinos_ENABLE_KokkosKernels=ON \
-DCMAKE_INSTALL_PREFIX=/usr/local/ \
-DBUILD_SHARED_LIBS=OFF \
.. | tee cmake.log
I believe KOKKOS_ENABLE_OpenMP
should be implied by Trilinos_ENABLE_OpenMP=ON
, I have so far not tried setting it explicitly. The Trilinos_ENABLE_KokkosKernels=ON
I added only when I tried tracking the problem, I think it's not actually needed for my use case (and neither it should be causing the issue).
@sekoenig Is Tpetra even enabled in this build? (A list of enabled packages is printed during configure)
Tpetra is not a required dependency of Belos so having Belos won't automatically enable it. If that's the case, just add -DTrilinos_ENABLE_Tpetra=ON
. It wouldn't hurt to add -DKokkos_ENABLE_OPENMP=ON
either, though I think it shoould be set automatically by Trilinos_ENABLE_OpenMP.
It's possible that the libtpetraclassic you're linking is from a previous build with a different configuration.
@brian-kelley Very good point, but it is being enabled (admittedly I don't know how exactly, in my previous build with ETI, I was setting Tpetra_INST_SERIAL=ON
and Tpetra_INST_DOUBLE=ON
, I removed those for the non-ETI build).
Here's some relevant CMake output:
Processing enabled top-level package: Tpetra (TSQR, Core)
-- Tpetra: Using internal Kokkos
-- Tpetra: Enabling deprecated code
[...]
-- Tpetra execution space availability (ON means available):
-- - Serial: OFF
-- - Threads: OFF
-- - OpenMP: ON
-- - Cuda: OFF
-- - HIP: OFF
-- - SYCL: OFF
-- Tpetra: Tpetra_INST_INT_LONG_LONG is enabled by default.
-- Tpetra: Tpetra_INST_INT_UNSIGNED is disabled by default.
-- Tpetra: Tpetra_INST_INT_UNSIGNED_LONG is disabled by default.
-- Tpetra: Tpetra_INST_INT_INT is disabled by default.
-- Tpetra: Tpetra_INST_INT_LONG is disabled by default.
--
-- Tpetra: Validate global ordinal setting ...
-- Tpetra: global ordinal setting is OK
--
-- Setting default Node to Tpetra::KokkosCompat::KokkosOpenMPWrapperNode.
Processing enabled top-level package: Thyra (Core, TpetraAdapters)
-- Thyra ModelEvaluator Hessian interfaces On
Processing enabled top-level package: Xpetra (Libs)
-- Xpetra: Enabling deprecated code
-- NOTE: Setting Xpetra_ENABLE_EpetraExt=OFF since Xpetra_ENABLE_Epetra='OFF'
-- Xpetra: Disabling ETI
[...]
Processing ETI support: TpetraCore
-- TpetraCore: Processing ETI / test support
-- Enabled Scalar types: long long|double|std::complex<double>
-- Enabled LocalOrdinal types: int
-- Enabled GlobalOrdinal types: long long
-- Enabled Node types: Tpetra::KokkosCompat::KokkosOpenMPWrapperNode
-- Set of enabled types, before exclusions: S={long long} N={Tpetra::KokkosCompat::KokkosOpenMPWrapperNode} LO={int} GO={long long};S={double} N={Tpetra::KokkosCompat::KokkosOpenMPWrapperNode} LO={int} GO={long long};S={std::complex<double>} N={Tpetra::KokkosCompat::KokkosOpenMPWrapperNode} LO={int} GO={long long}
I was previously confused by those lines like Tpetra_INST_INT_LONG_LONG is enabled by default
and Processing ETI support: TpetraCore
etc., but after looking into the relevant CMakeLists.txt my understanding (it might be wrong) is that those lines merely mean that the pertinent TPETRA_HAVE_...
macros get defined, not necessarily that ETI is being used.
@brian-kelley I forgot to add before: according to the file dates, the libtpetraclassic.a
I see is from the most recent build.
@sekoenig Your understanding of the Tpetra_INST_...
variables is correct.
I tried your exact config with master https://github.com/trilinos/Trilinos/commit/bf2943a7cb297c65e9e93b1adf9152e70335c671 but I still see the name() symbols:
nm --demangle libtpetraclassic.a
Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp.o:
U __cxa_atexit
U __dso_handle
0000000000000000 t _GLOBAL__sub_I__ZN6Tpetra12KokkosCompat23KokkosDeviceWrapperNodeIN6Kokkos6OpenMPENS2_9HostSpaceEE4nameB5cxx11Ev
U operator delete(void*)
0000000000000000 b (anonymous namespace)::local_activeRCPNodesSetup
0000000000000020 b Kokkos::Tools::Experimental::Impl::team_tuners
0000000000000000 T Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::OpenMP, Kokkos::HostSpace>::name[abi:cxx11]()
0000000000000040 T Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace>::name[abi:cxx11]()
U Teuchos::ActiveRCPNodesSetup::ActiveRCPNodesSetup()
U Teuchos::ActiveRCPNodesSetup::~ActiveRCPNodesSetup()
0000000000000000 W std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::Tools::Experimental::TeamSizeTuner, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::~map()
0000000000000000 W std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::Tools::Experimental::TeamSizeTuner, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::~map()
0000000000000000 n std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::Tools::Experimental::TeamSizeTuner, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::~map()
U std::ios_base::Init::Init()
U std::ios_base::Init::~Init()
0000000000000000 W std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> > >::_M_erase(std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Kokkos::Tools::Experimental::TeamSizeTuner> >*)
0000000000000001 b std::__ioinit
Is there any possibility that the compiler is finding Kokkos headers from a different installation on your system? The only thing I can think of at this point is that KokkosCore_config.h (the file that defines macros like KOKKOS_ENABLE_OPENMP
) is being found from a different installation where OpenMP was not enabled, causing this function to not be compiled.
Adding this patch to Trilinos and rebuilding will trigger an error if KOKKOS_ENABLE_OPENMP
is not seen as defined, and a benign warning if it is:
diff --git a/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp b/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp
index da6968ddb29..785429798df 100644
--- a/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp
+++ b/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp
@@ -1,6 +1,12 @@
#include <Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.hpp>
#include "Teuchos_ParameterList.hpp"
+#ifdef KOKKOS_ENABLE_OPENMP
+#warning "KOKKOS_ENABLE_OPENMP is defined (good!)"
+#else
+#warning "KOKKOS_ENABLE_OPENMP is NOT defined (bad! Kokkos headers maybe coming from wrong place!)"
+#endif
+
namespace Tpetra {
namespace KokkosCompat {
@brian-kelley Thanks for the patch! I applied it to my HEAD and get the following in the CMake output:
[ 84%] Building CXX object packages/tpetra/core/compat/CMakeFiles/tpetraclassic.dir/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp.o
/usr/local/src/Trilinos-upstream/packages/tpetra/core/compat/Tpetra_KokkosCompat_ClassicNodeAPI_Wrapper.cpp:5:2: warning: #warning before C++23 is a GCC extension
5 | #warning "KOKKOS_ENABLE_OPENMP is defined (good!)"
| ^~~~~~~
So that looks good, I think. I ran make clean
and I am rebuilding everything now to look at the results libtpertaclassic.a again.
In case it matters:
g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 13.2.0-24' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/reproducible-path/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/reproducible-path/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (Debian 13.2.0-24)
The output from mpicxx -v
, which I use to build my project, matches the exact GCC version. On the other system I tested, which also had the issue, GCC is at version 12.2 (Debian 12.2.0-14).
By the way, coming back to my original reason for turning off ETI, another question: if ETI is enabled, should it still work to have a project use template classes that were not explicitly instantiated at Trilinos build time? I tried that first, naturally, but got so many undefined references that I suspected I have to try without ETI.
if ETI is enabled, should it still work to have a project use template classes that were not explicitly instantiated at Trilinos build time?
It depends on the package - KokkosKernels always lets you call functions with types that were not enabled in ETI (as long as the kernel supports the type combinations - for many things you can't just plug in any custom user-defined type).
On the other hand, Tpetra and packages downstream of it generally don't allow it. They organize their internal headers differently (decl and def files) and the user-facing headers like Tpetra_CrsMatrix.hpp will only include the decl in an ETI build. This means the actual function definitions won't be available.
I see, thanks for that background info! Then indeed I will need to keep working without ETI (which I actually didn't find as terrible in terms of build times as I thought it might be).
Okay, so looking at libtpetraclassic.a
from my latest build, I do see the ::name
symbols now. In fact, looking into the Debian package from yesterday (I use checkinstall
after building to deploy the build), it is in there also! I am not sure why I didn't see it in the installed lib with nm
, perhaps something went wrong with the installation and I did not notice. The library inside the package built from the 14.4.0 release, however, really does not have it.
Clearly I need to make and compare more builds. It is possible I was updating both my cmake
call and the source version, so I need to determine what exactly looks good and what does not. Sorry for any confusion I created!
@brian-kelley After more testing, I think everything is actually fine. Here is my best reconstruction of what happened:
name
symbols missing in libtpetraclassic.a
seem to be a side effect of my workaround being applied: with that, the methods are defined inline, and then without ETI there won't be any code generated for them, hence nothing in the library. I did subsequently build without my workaround applied, but either the libraries from that were not being properly installed, or I made some other mistake.libtpetraclassic.a
(which otherwise I did not need to link against explicitly).Very sorry to have you chase this for nothing! At least for me, I found the discussion here very useful, though, as it helped me understand some Trilinos concepts better. Thanks a lot for that!
@bartlettroscoe While I think the issue here per se can be closed, I would like to come back to your comment. If building without ETI is already not actually supported anymore, or being considered for deprecation, I would like to request this to be reconsidered. For that, it may be good to explain my use case a bit more:
Trilinos is a relatively late addition to my project, which was not written with Kokkos or Tpetra in mind. When I needed an efficient distributed linear solver, I found Belos, and I found it relatively easy to integrate it into my code, inserting just some relevant traits into namespace Belos
and wrapping pointers manually with Teuchos reference counting. This works very nicely for me, and Trilinos does not appear otherwise in my code.
At some point I started porting more and more aspects of my code to run on GPUs via CUDA, and I particularly like CUDA unified memory. Then I noticed that parts using Belos stopped working when Belos iterations were called using such UVM memory, presumably because Belos internally allocates memory that then my operator is applied to as part of the iteration, assuming CUDA unified memory but not getting that from Belos. My initial workaround was to always perform extra copies at each step, but now I finally decided I don't want to take the performance hit from that anymore (primarily because for a new use case it was particularly significant).
I found the existing CudaUVMSpace and got excited, but then quickly I noticed that in order to use that I have to enable full CUDA support (fine) and also funnel all my code through nvcc
. That latter point is not easy for me for a number of reasons. Looking for alternatives, I found the Kokkos MemorySpace concept, which looked like a natural extension point for me. So I wrote my own implementation of this concept, which ultimately uses cudaMallocManaged
and cudaFree
at the bottom.
For this to actually work, I had to inject a bit of boilerplate code into namespace like Kokkos::Impl
and Tpetra::Details::Impl
, which was a bit tedious but not all that bad eventually. Then all that was left was instatiating Belos with Tpetra vectors that use Kokkos::DefaultHostExecutionSpace
together with my custom memore space, and with that everything is now working great. Only I had to turn off ETI for this all to compile and link properly.
I realize from the fact alone that I have to inject template specializations into those Impl
namepaces that this is not an officially supported use case, and the MemorySpace concepts is ultimately just a Trilinos-internal thing, not a point intended for user extensions (please correct if I am wrong and I am just implementing things incorrectly). Nevertheless, I hope you might see some value in my use case, and I could imagine there are other scenarios where people might just want to handle host allocations in a nonstandard way. If only build without ETI will keep being supported, I think my solution is sustainable, I might just have to update my boilerplate if Kokkos/Tpetra internals change in an incompatible way. I hope this would be seen as valid motivation to keep the support.
Or you might dismiss my use case as just an ugly manifestation of Hyrum's Law ;)
Or you might dismiss my use case as just an ugly manifestation of Hyrum's Law ;)
@sekoenig, your use case is one of the reasons that Trilinos supported Trilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF
in the first place.
@bartlettroscoe While I think the issue here per se can be closed, I would like to come back to your comment. If building without ETI is already not actually supported anymore, or being considered for deprecation, I would like to request this to be reconsidered. For that, it may be good to explain my use case a bit more:
@sekoenig, we would need to take this before the Trilinos leadership team. I am just relaying what I learned as part of #12932 (see https://github.com/trilinos/Trilinos/issues/12932#issuecomment-2072133402).
@rppawlo and @sebrowne, what is the official Trilinos position on support for implicit template instantiation going forward (now that it is not being tested anymore)? What is the expectation when someone posts and issue where they used a configuration of Trilinos with Trilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF
? Does the policy about untested compilers and MPI stated here generalize to untested configurations of Trilinos? (That is, is there no support for Trilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF
builds but Trilinos will accept PRs to address any issues found?)
@sekoenig, as an alternative, would it work if Trilinos provided a mechanism at configure time to declare your additional explicit instantiation sets (and provide includes and include directories to support these)? I don't know how this would get done but that is one option that I can think of. (But it could be quite a bit of work to get that into all of the instantiations in Trilinos so this could be a non-trivial ask.)
We spent some time discussing ETI motivated by this ticket at the leadership meeting yesterday. As it stands, we currently do not test ETI=OFF in any testing (PR or nightly) and consider it an unsupported feature. There was no decision yesterday on whether to support or remove the flag. There will be continued discussion at the next leadership meeting and will probably bring it up at the next developer and sart meetings for input. If it is easy to add configure time support for added types, that would probably be best, but I'm not sure how well that would work in practice for every package.
@bartlettroscoe, @rppawlo Thanks a lot for the discussion. It is reassuring to hear that my use case is in line with one of the motivations to support disabling ETI. I looked at the discussion #12932, which notes that for the particular use case mentioned there (arbitrary precision types), users eventually backed off from it due to problems. I hope that the details I provided above regarding my use case may provide additional perspective for the discussion.
Regarding a mechanism to instantiate custom types at configure time, naively I suspect that would be very tedious and probably involve adding source files somehow when cmake
is called. Perhaps I misunderstand, but if that is so, I think it might actually be easier for users like me to just carry around a modified Trilinos branch and rebase that as needed onto new Trilinos releases. But naturally I prefer to have the extensions fully within my project and otherwise use a vanially upstream Trilinos :)
Overall, since the design of Kokkos, Tpetra, and downstream packages is so heavily based on templates, I would find it most natural to let users benefit from this design decision. The cost for testing with ETI disabled I admit though is a valid concern, so I think the minimum I would ask for is that at least the option to disable ETI remains, even if unsupported. That of course then leads to the question what to do when someone submits a PR that fixes some issue only manifest when ETI is disabled. From what I read. I think this is part of the discussion within the leadership team, and I know too little about it to comment further.
I understand that in particular in Tpetra there are some template parameters that are not actually meant to be changed from their defaults (I ran into this for example when I wanted to use long in local ordinals so that I can link against ILP64 MKL BLAS/LAPACK libs). I also saw somewhere a discussion to move Tpetra in the long run towards using fewer template parameters. If that happens, then perhaps for the remaining ones the burden of supporting non-ETI builds is reduced.
From what I saw while implementing my custom memory space, though, my impression is that there are some opportunities to streamline the design to make it easier for users to instantiate their own extensions. The ::name
methods mentioned in this discussion are one example: I think these could be defined inline instead of in a .cpp file, and explicit specialization could be elegantly avoided with boost::core::demangle
-- unless the preference is to avoid depending on Boost). I also noticed destructors not defined inline in some class templates (e.g. for SharedAllocationRecordCommon
), although I believe that would be legal. If there is interest, I would offer to work on PRs for such small things.
Finally, if non-ETI support will be removed, I should think about alternatives for what I am doing now. I am going off topic here, but a very simple way to achieve what I want to do would be extending Kokkos::InitializationSettings
to add some function pointers for malloc
, free
routines to be used in the default host execution space. From a design perspective, however, using a custom memory space seems more in line with Kokkos' design philosophy.
Question
@csiefer2, @bartlettroscoe, @srajama1: In my code I am using Belos with Tpetra vectors and Teuchos helpers for solving large linear systems. To implement certain Kokkos extensions (I can elaborate if necessary), I would like to use Trilinos built with
Trilinos_ENABLE_EXPLICIT_INSTANTIATION=OFF
. Building itself works fine, and my code also compiles (slowly :), but then at linking I get an impressive host of 'undefined reference" errors. Basically it looks like none of the required templates actually gets instantiated. Previously, with ETI enabled and linking to the relevant libraries, everything worked fine.I assume there is something I am doing wrong. Since when ETI is disabled no shared libraries seem to be generated, I think there is nothing I need to link to. To me that makes sense because the templates should be instantiated when I compile my code, but am I correct? If so, is there anything specific I need to do in order to use Trilions without ETI? I tried searching for guidance, but could not quite find anything useful. It would be great if someone here could help me!