kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
304 stars 96 forks source link

Nightly build errors, spadd_symbolic linking errors #1407

Closed ndellingwood closed 2 years ago

ndellingwood commented 2 years ago

Nightly builds (first detected with Trilinos builds with kokkos and kokkos-kernels develop branches) are failing with linking errors following merge of #1399. Snip below is output from cuda/9.2 + gcc/7.2.0 build, though this is occurring across other nightly builds so far

FAILED: kokkos-kernels/unit_test/KokkosKernels_sparse_serial.exe 
: && /projects/sems/install/rhel7-x86_64/sems/compiler/gcc/7.2.0/openmpi/4.0.2/bin/mpicxx  -expt-extended-lambda -arch=sm_35  -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -O3 -DNDEBUG   kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_sparse_serial.dir/Test_Main.cpp.o kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_sparse_serial.dir/serial/Test_Serial_Sparse.cpp.o  -o kokkos-kernels/unit_test/KokkosKernels_sparse_serial.exe  kokkos-kernels/unit_test/libkokkoskernels_gtest.a  kokkos-kernels/src/libkokkoskernels.a  kokkos/algorithms/src/libkokkosalgorithms.a  kokkos/containers/src/libkokkoscontainers.a  kokkos/core/src/libkokkoscore.a  -ldl  /projects/sems/install/rhel7-x86_64/sems/compiler/cuda/9.2/base/lib64/libcudart.so  /projects/sems/install/rhel7-x86_64/sems/compiler/cuda/9.2/base/lib64/libcublas.so  /projects/sems/install/rhel7-x86_64/sems/compiler/cuda/9.2/base/lib64/libcufft.so  /projects/sems/install/rhel7-x86_64/atdm/tpl/openblas/0.3.6/gcc/7.2.0/lib/libopenblas.so  /projects/sems/install/rhel7-x86_64/sems/compiler/cuda/9.2/base/lib64/libcusparse.so && :
kokkos-kernels/unit_test/CMakeFiles/KokkosKernels_sparse_serial.dir/serial/Test_Serial_Sparse.cpp.o: In function `void KokkosSparse::Experimental::spadd_symbolic<KokkosKernels::Experimental::KokkosKernelsHandle<unsigned long, int, double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> > >(KokkosKernels::Experimental::KokkosKernelsHandle<unsigned long, int, double, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>*, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >, Kokkos::View<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks, Kokkos::MemoryTraits<0u> >)':
tmpxft_00007a8e_00000000-5_Test_Serial_Sparse.cudafe1.cpp:(.text._ZN12KokkosSparse12Experimental14spadd_symbolicIN13KokkosKernels12Experimental19KokkosKernelsHandleImidN6Kokkos6SerialENS5_9HostSpaceES7_EENS5_4ViewIPKmJNS5_10LayoutLeftENS5_6DeviceIS6_S7_EENS5_12Experimental14EmptyViewHooksENS5_12MemoryTraitsILj0EEEEEENS9_IPKiJSC_SE_SG_SI_EEESJ_SM_NS9_IPmJSC_SE_SG_SI_EEENS9_IPiJSC_SE_SG_SI_EEEEEvPT_T0_T1_T2_T3_T4_[_ZN12KokkosSparse12Experimental14spadd_symbolicIN13KokkosKernels12Experimental19KokkosKernelsHandleImidN6Kokkos6SerialENS5_9HostSpaceES7_EENS5_4ViewIPKmJNS5_10LayoutLeftENS5_6DeviceIS6_S7_EENS5_12Experimental14EmptyViewHooksENS5_12MemoryTraitsILj0EEEEEENS9_IPKiJSC_SE_SG_SI_EEESJ_SM_NS9_IPmJSC_SE_SG_SI_EEENS9_IPiJSC_SE_SG_SI_EEEEEvPT_T0_T1_T2_T3_T4_]+0x198): undefined reference to `KokkosSparse::Impl::SPADD_SYMBOLIC<KokkosKernels::Experimental::KokkosKernelsHandle<unsigned long const, int const, double const, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, false, false>::spadd_symbolic(KokkosKernels::Experimental::KokkosKernelsHandle<unsigned long const, int const, double const, Kokkos::Serial, Kokkos::HostSpace, Kokkos::HostSpace>*, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<unsigned long const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<unsigned long*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >)'
collect2: error: ld returned 1 exit status

@brian-kelley can you investigate?

e10harvey commented 2 years ago

CI checks in #1405 show this too.

ndellingwood commented 2 years ago

@e10harvey yeah, looks like all of the kokkos-kernels nightlies were impacted as well

ndellingwood commented 2 years ago

Was there a merge in between #1399 i.e. SHA https://github.com/kokkos/kokkos-kernels/commit/2ced7df26e780cef8599151a2cf5a3a6cb1d7cfd and its target SHA https://github.com/kokkos/kokkos-kernels/commit/7ce1b8fef757e91919d543a0b278888f31ef91c5 (reported in the set of passing CI build information) that introduced the disruption? Trying to understand why it passed CI but we're seeing issues outside of that testing

brian-kelley commented 2 years ago

@ndellingwood @e10harvey Sorry about that. I just found the problem - I copy-pasted the SpAdd files to do the graph color ETI, and forgot to change two include guard names. So as soon as both PRs were combined, those collided and the SpAdd eti_spec_avail stopped getting defined. That explains why PR testing didn't catch it. Will open a PR for the fix in a minute.

ndellingwood commented 2 years ago

@brian-kelley thanks for the update and fast fix!