trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 565 forks source link

Phalanx: ViewOfViews test hangs in gcc/8.5 OpenMP build with Kokkos develop (pre-4.4 release) #13281

Closed ndellingwood closed 1 month ago

ndellingwood commented 1 month ago

Bug Report

@trilinos/phalanx @rppawlo

Description

The Phalanx Phalanx_tViewOfViews_MPI_1 unit tests are hanging until timeout in gcc/8.5.0 builds with the OpenMP backend when tested with the Kokkos develop branch (to-be 4.4 release). This has been happening since before merge of #13203 (it was masked by other unrelated issues)

This is the stacktrace from gdb when manually aborting the job:

Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false

Host Parallel Execution Space:
  KOKKOS_ENABLE_OPENMP: yes

OpenMP Runtime Configuration:
Kokkos::OpenMP thread_pool_topology[ 1 x 8 x 1 ]

***
*** Unit test suite ...
***

Sorting tests by group name then by the order they were added ... (time = 1.66e-06)

Running unit tests ...

0. PhalanxViewOfViews_OldImpl_UnitTest ... [Passed] (0.0124 sec)

Thread 1 "Phalanx_tViewOf" received signal SIGINT, Interrupt.
0x0000200001ab79f8 in __lll_lock_wait () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-236.el8.ppc64le libatomic-8.5.0-10.el8.ppc64le libgcc-8.5.0-10.el8.ppc64le libgfortran-8.5.0-10.el8.ppc64le libgomp-8.5.0-10.el8.ppc64le libquadmath-8.5.0-10.el8.ppc64le libstdc++-8.5.0-10.el8.ppc64le nvidia-driver-cuda-libs-530.30.02-1.el8.ppc64le
(gdb) bt
#0  0x0000200001ab79f8 in __lll_lock_wait () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
#1  0x0000200001aaceec in pthread_mutex_lock () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
#2  0x000000001018e964 in void Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::OpenMP, Kokkos::OpenMP::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::Tools::Experimental::SpecialSynchronizationCases, Kokkos::OpenMP::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1} const&) [clone .isra.79] [clone .constprop.109] ()
#3  0x000000001016420c in Kokkos::fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#4  0x000000001018116c in Kokkos::HostSpace::deallocate(char const*, void*, unsigned long, unsigned long) const ()
#5  0x000000001018181c in Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::HostSpace>::~SharedAllocationRecordCommon() ()
#6  0x0000000010032468 in Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, double, true> >::~SharedAllocationRecord() ()
#7  0x00000000100291a4 in void Kokkos::Impl::deallocate<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, double, true> >(Kokkos::Impl::SharedAllocationRecord<void, void>*) ()
#8  0x000000001018c024 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) ()
#9  0x0000000010029854 in std::enable_if<!std::is_same<Kokkos::RangePolicy<Kokkos::OpenMP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<double***, Kokkos::HostSpace>, false>::DestroyTag>::schedule_type::type, Kokkos::Dynamic>::value, void>::type Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<double***, Kokkos::HostSpace>, false>, Kokkos::RangePolicy<Kokkos::OpenMP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<double***, Kokkos::HostSpace>, false>::DestroyTag>, Kokkos::OpenMP>::execute_parallel<Kokkos::RangePolicy<Kokkos::OpenMP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<double***, Kokkos::HostSpace>, false>::DestroyTag> >() const [clone ._omp_fn.14] ()
#10 0x0000200001a04ca8 in GOMP_parallel () from /lib64/libgomp.so.1
#11 0x000000001005f1a4 in void Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<double***, Kokkos::HostSpace>, false>::parallel_for_implementation<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<double***, Kokkos::HostSpace>, false>::DestroyTag>() ()
#12 0x000000001005f594 in void Kokkos::Impl::deallocate<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<double***, Kokkos::HostSpace>, false> >(Kokkos::Impl::SharedAllocationRecord<void, void>*) ()
#13 0x000000001018c024 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) ()
#14 0x0000000010030534 in PHX::ViewOfViews2<2, Kokkos::View<double***, Kokkos::HostSpace>, Kokkos::HostSpace>::~ViewOfViews2() ()
#15 0x00000000100249e0 in PhalanxViewOfViews_NewImpl_UnitTest::runUnitTestImpl(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&, bool&) const ()
#16 0x0000000010157188 in Teuchos::UnitTestBase::runUnitTest(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&) const ()
#17 0x000000001015aa74 in Teuchos::UnitTestRepository::runUnitTestImpl(Teuchos::UnitTestBase const&, Teuchos::basic_FancyOStream<char, std::char_traits<char> >&) ()
#18 0x000000001015dfa8 in Teuchos::UnitTestRepository::runUnitTests(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&) ()
#19 0x000000001015f0c0 in Teuchos::UnitTestRepository::runUnitTestsFromMain(int, char**) ()
#20 0x000000001000d44c in main ()

The hang occurs regardless of number of threads tested with (tested with 1, 4, 8 threads and hang occurred in each case) @rppawlo can you help advise?

Steps to Reproduce

Reproducer configuration (weaver testbed):

This was seen on the Weaver testbed (Power9+Volta70, though the GPU type is irrelevant). I also noticed the hangs occurring in the nightly Framework's build with gcc/8

# Repo prep
git clone -b develop https://github.com/trilinos/Trilinos.git
TRILINOS_DIR=$PWD/Trilinos
git clone -b develop https://github.com/kokkos/kokkos.git
KOKKOS_DIR=$PWD/kokkos
git clone -b develop https://github.com/kokkos/kokkos-kernels.git
KOKKOSKERNELS_DIR=$PWD/kokkos-kernels

cd $TRILINOS_DIR
ln -s ${KOKKOS_DIR}/kokkos kokkos
ln -s ${KOKKOSKERNELS_DIR}/kokkos-kernels kokkos-kernels
cd -

mkdir -p Build
cd Build

# Environment and configuration

export TRILINOS_DIR=<path-to-trilinos>

source /etc/profile.d/modules.sh
source /projects/ppc64le-pwr9-rhel8/legacy-env.sh
module purge
module load openmpi/4.1.1/gcc/8.3.1/cuda/11.2.2 cmake/3.23.1 openblas/0.3.18/gcc/8.3.1 metis/5.1.0/gcc/8.3.1  hdf5/1.10.7/gcc/8.3.1/openmpi/4.1.1 netcdf-fortran/4.5.4/gcc/8.3.1/openmpi/4.1.1 parallel-netcdf/1.12.2/gcc/8.3.1/openmpi/4.1.1 parmetis/4.0.3/gcc/8.3.1/openmpi/4.1.1 zlib/1.2.11/gcc/8.3.1 boost/1.70.0/gcc/8.3.1
module list

export OMP_NUM_THREADS=8
# This is needed for the Kokkos openmp.partition_master test to pass
export OMP_MAX_ACTIVE_LEVELS=1

cmake \
 -D CMAKE_INSTALL_PREFIX=${TRILINOS_INSTALL_DIR} \
 -D CMAKE_CXX_COMPILER="`which mpicxx`" \
 -D CMAKE_C_COMPILER="`which mpicc`" \
 -D CMAKE_Fortran_COMPILER="`which mpifort`" \
 -D CMAKE_BUILD_TYPE:STRING=RELEASE \
 -D CMAKE_CXX_STANDARD=17 \
 -D BUILD_SHARED_LIBS=OFF \
 -D CMAKE_INSTALL_PREFIX=$PWD/install \
 -D TPL_ENABLE_CUDA:STRING=OFF \
 -D TPL_ENABLE_MPI:STRING=ON \
 -D MPI_BASE_DIR:PATH="$OPENMPI_ROOT" \
 -D MPI_BIN_DIR:PATH="$OPENMPI_BIN" \
 -D MPI_EXEC_POST_NUMPROCS_FLAGS:STRING="-map-by;socket:PE=4" \
-D TPL_ENABLE_BLAS:STRING=ON \
  -D BLAS_LIBRARY_DIRS:FILEPATH="$OPENBLAS_ROOT/lib" \
  -D BLAS_LIBRARY_NAMES:STRING="openblas" \
-D TPL_ENABLE_LAPACK:STRING=ON \
  -D LAPACK_INCLUDE_DIRS:FILEPATH="$OPENBLAS_ROOT/include" \
  -D LAPACK_LIBRARY_DIRS:FILEPATH="$OPENBLAS_ROOT/lib" \
  -D LAPACK_LIBRARY_NAMES:STRING="openblas" \
-D TPL_ENABLE_Boost=ON \
   -D Boost_INCLUDE_DIRS:PATH="${BOOST_ROOT}/include" \
   -D Boost_LIBRARY_DIRS:PATH="${BOOST_ROOT}/lib" \
-D TPL_ENABLE_BoostLib=ON \
   -D BoostLib_INCLUDE_DIRS:PATH="${BOOST_ROOT}/include" \
   -D BoostLib_LIBRARY_DIRS:PATH="${BOOST_ROOT}/lib" \
-D TPL_ENABLE_Netcdf=OFF \
 -D Trilinos_ENABLE_OpenMP=ON \
 -D Trilinos_ENABLE_COMPLEX=ON \
 -D Trilinos_ENABLE_TESTS=OFF \
 -D Trilinos_ENABLE_EXAMPLES=OFF \
 -D Trilinos_ENABLE_Kokkos=ON \
 -D Kokkos_ENABLE_SERIAL=ON \
 -D Kokkos_ENABLE_OPENMP=ON \
 -D Kokkos_ENABLE_CUDA=OFF \
 -D Kokkos_ARCH_VOLTA70=ON \
 -D Kokkos_ARCH_POWER9=ON \
 -D Kokkos_ENABLE_TESTS=ON \
 -D Kokkos_ENABLE_IMPL_MDSPAN=OFF \
 -D Trilinos_ENABLE_KokkosKernels=ON \
 -D KokkosKernels_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Tpetra=ON \
 -D Tpetra_ENABLE_CUDA=OFF \
 -D Tpetra_INST_SERIAL:BOOL=ON \
 -D Tpetra_INST_OPENMP:BOOL=ON \
 -D Tpetra_INST_PTHREAD:BOOL=OFF \
 -D Tpetra_INST_CUDA:BOOL=OFF \
 -D Tpetra_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Sacado=ON \
 -D Sacado_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Amesos2=ON \
 -D Amesos2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Teuchos=ON \
 -D Teuchos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Stokhos=ON \
 -D Stokhos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Phalanx=ON \
 -D Phalanx_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Ifpack2=ON \
 -D Ifpack2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Anasazi=ON \
 -D Trilinos_ENABLE_Adelus=ON \
 -D Adelus_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Belos=ON \
 -D Belos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Zoltan2=ON \
 -D Zoltan2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_MueLu=ON \
 -D MueLu_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Panzer=ON \
 -D Panzer_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Intrepid2=ON \
 -D Intrepid2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_ShyLU_NodeTacho=OFF \
 -D Trilinos_ENABLE_Zoltan2Sphynx=OFF \
 -D Trilinos_ENABLE_SEACAS=OFF \
\
 -DMueLu_UnitTestsTpetra_MPI_4_DISABLE=ON \
 -DMueLu_UnitTestsTpetra_MPI_1_DISABLE=ON \
\
 -DKokkos_SOURCE_DIR_OVERRIDE:STRING=kokkos \
 -DKokkosKernels_SOURCE_DIR_OVERRIDE:STRING=kokkos-kernels \
$TRILINOS_DIR
dalg24 commented 1 month ago

The bug is in PHX::ViewOfViews2. Just like you have setView(), you'd need to unset them, possibly as part of the destructor. Otherwise the VoV destructor will attempt to delete individual inner views within a kernel and won't be able to acquire the lock since the parallel region is already active.

dalg24 commented 1 month ago

It is covered in https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/View.html#i-really-want-a-view-of-views-what-do-i-do, you need to destroy the inner views yourself "on the host, outside of a parallel region".

rppawlo commented 1 month ago

The bug is in PHX::ViewOfViews2. Just like you have setView(), you'd need to unset them, possibly as part of the destructor. Otherwise the VoV destructor will attempt to delete individual inner views within a kernel and won't be able to acquire the lock since the parallel region is already active.

ViewOfViews2 was removed from the code base a couple of weeks ago when fixing #13203 . Is this failure using an old version of trilinos by any chance?

dalg24 commented 1 month ago

As far as I know, Nathan is testing against Trilinos develop every night. I see the issue in the current develop https://github.com/trilinos/Trilinos/blob/78784348ce1b5888d0117a25e79f8f0fb43c3069/packages/phalanx/src/design/Phalanx_KokkosViewOfViews.hpp#L205-L211

Admittedly I did not test, just reading the code and looking at the bug report.

ndellingwood commented 1 month ago

The nightly testing with this failure occurring was most recently tested with sha a29a8fd366cf1b842f6c4bd1b9c395b417804fb7, which contained the fixes from #13203

However, the stacktrace I posted in the OP was from a manual rerun where I was bisecting the Trilinos shas and rolled back to an older sha (prior to PR #13203 ), sorry for the confusion there. I'll rebuild+rerun to add an updated stacktrace

dalg24 commented 1 month ago

As far as I can tell the current code will hang. I don't see where that would be mitigated. The inner views are set here https://github.com/trilinos/Trilinos/blob/78784348ce1b5888d0117a25e79f8f0fb43c3069/packages/phalanx/src/design/tPhalanxViewOfViews.cpp#L105-L108 and they are not being unset/destructed before the ~ViewOfViews2() is called.

rppawlo commented 1 month ago

That explains it. FYI - the file with the ViewOfViews2 object is not used anymore. It's in a folder that is not a part of the source code that is compiled or installed. I saved off an old version of the file in #13203 to the design directory to remember to explore something else at a later time. The current v-of-v implementation is in the src directory (not the src/design directory): https://github.com/trilinos/Trilinos/blob/78784348ce1b5888d0117a25e79f8f0fb43c3069/packages/phalanx/src/Phalanx_KokkosViewOfViews.hpp

ndellingwood commented 1 month ago

Here's the correct stacktrace after aborting the test:

...
5. PhalanxViewOfViews_WrapperExample_UnitTest ... [Passed] (0.00154 sec)
6. PhalanxViewOfViews_CreateHostHost_UnitTest ... [Passed] (0.000519 sec)

Thread 1 "Phalanx_tViewOf" received signal SIGINT, Interrupt.
0x0000200001ab79f8 in __lll_lock_wait () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-236.el8.ppc64le libatomic-8.5.0-10.el8.ppc64le libgcc-8.5.0-10.el8.ppc64le libgfortran-8.5.0-10.el8.ppc64le libgomp-8.5.0-10.el8.ppc64le libquadmath-8.5.0-10.el8.ppc64le libstdc++-8.5.0-10.el8.ppc64le nvidia-driver-cuda-libs-530.30.02-1.el8.ppc64le
(gdb) bt
#0  0x0000200001ab79f8 in __lll_lock_wait () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
#1  0x0000200001aaceec in pthread_mutex_lock () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
#2  0x0000000010187984 in void Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::OpenMP, Kokkos::OpenMP::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::Tools::Experimental::SpecialSynchronizationCases, Kokkos::OpenMP::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1} const&) [clone .isra.79] [clone .constprop.109] ()
#3  0x000000001015d22c in Kokkos::fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#4  0x000000001017a18c in Kokkos::HostSpace::deallocate(char const*, void*, unsigned long, unsigned long) const ()
#5  0x000000001017a83c in Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::HostSpace>::~SharedAllocationRecordCommon() ()
#6  0x000000001002e9c8 in Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::OpenMP, double, true> >::~SharedAllocationRecord() ()
#7  0x0000000010025464 in void Kokkos::Impl::deallocate<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::OpenMP, double, true> >(Kokkos::Impl::SharedAllocationRecord<void, void>*) ()
#8  0x0000000010185044 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) ()
#9  0x0000000010025fd4 in std::enable_if<!std::is_same<Kokkos::RangePolicy<Kokkos::OpenMP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>, false>::DestroyTag>::schedule_type::type, Kokkos::Dynamic>::value, void>::type Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>, false>, Kokkos::RangePolicy<Kokkos::OpenMP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>, false>::DestroyTag>, Kokkos::OpenMP>::execute_parallel<Kokkos::RangePolicy<Kokkos::OpenMP, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>, false>::DestroyTag> >() const [clone ._omp_fn.46] ()
#10 0x0000200001a04ca8 in GOMP_parallel () from /lib64/libgomp.so.1
#11 0x0000000010058d14 in void Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>, false>::parallel_for_implementation<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>, false>::DestroyTag>() ()
#12 0x0000000010059104 in void Kokkos::Impl::deallocate<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>, false> >(Kokkos::Impl::SharedAllocationRecord<void, void>*)
    ()
#13 0x0000000010185044 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement(Kokkos::Impl::SharedAllocationRecord<void, void>*) ()
#14 0x0000000010030970 in Kokkos::Impl::ViewTracker<Kokkos::View<Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>**, Kokkos::LayoutRight, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks> >::operator=(Kokkos::Impl::ViewTracker<Kokkos::View<Kokkos::View<Sacado::Fad::Exp::GeneralFad<Sacado::Fad::Exp::DynamicStorage<double, double> >**, Kokkos::LayoutRight, Kokkos::OpenMP>**, Kokkos::LayoutRight, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks> > const&) ()
#15 0x00000000100234b0 in PhalanxViewOfViews_FadAndAssignment_UnitTest::runUnitTestImpl(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&, bool&) const ()
#16 0x00000000101501a8 in Teuchos::UnitTestBase::runUnitTest(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&) const ()
#17 0x0000000010153a94 in Teuchos::UnitTestRepository::runUnitTestImpl(Teuchos::UnitTestBase const&, Teuchos::basic_FancyOStream<char, std::char_traits<char> >&) ()
#18 0x0000000010156fc8 in Teuchos::UnitTestRepository::runUnitTests(Teuchos::basic_FancyOStream<char, std::char_traits<char> >&) ()
#19 0x00000000101580e0 in Teuchos::UnitTestRepository::runUnitTestsFromMain(int, char**) ()
#20 0x000000001000d16c in main ()
rppawlo commented 1 month ago

I was able to replicate in a gcc 12 build. It's hanging in the default move constructor for the VoV object. Should have a fix up soon.

dalg24 commented 1 month ago

Proposed resolution that fixes the deadlock but lacks the safeguard if users hold on to a copy of the underlying vovs. phalanx_vov.patch

(edit: had messed up the bounds in the inner view deleter)

ndellingwood commented 1 month ago

@rppawlo @dalg24 I confirmed the patch Damien provided above resolved the timeout issue for me

[ndellin@weaver8 ViewOfViews]$ ctest -R Phalanx_tViewOfViews_MPI_1 -V
...
1: Running unit tests ...
1: 
1: 0. PhalanxViewOfViews_ViewOfView3_DefaultStreamInitialize_UnitTest ... [Passed] (0.0157 sec)
1: 1. PhalanxViewOfViews_ViewOfView3_DefaultStreamCtor_UnitTest ... [Passed] (0.0143 sec)
1: 2. PhalanxViewOfViews_ViewOfView3_UserStreamCtor_UnitTest ... Using partition_space, concurrency=8
1: [Passed] (0.0127 sec)
1: 3. PhalanxViewOfViews_ViewOfView3_UserStreamInitialize_UnitTest ... Using partition_space, concurrency=8
1: [Passed] (0.0125 sec)
1: 4. PhalanxViewOfViews_ViewOfView3_DefaultCtorDtor_UnitTest ... [Passed] (2.71e-06 sec)
1: 5. PhalanxViewOfViews_WrapperExample_UnitTest ... [Passed] (0.000682 sec)
1: 6. PhalanxViewOfViews_CreateHostHost_UnitTest ... [Passed] (0.000744 sec)
1: 7. PhalanxViewOfViews_FadAndAssignment_UnitTest ... [Passed] (0.00139 sec)
1: 8. PhalanxViewOfViews_FadHierarchicMDRangeBug_UnitTest ... 
1: [Passed] (0.00012 sec)
1: 
1: Total Time: 0.0585 sec
1: 
1: Summary: total = 9, run = 9, passed = 9, failed = 0
1: 
1: End Result: TEST PASSED
1/1 Test #1: Phalanx_tViewOfViews_MPI_1 .......   Passed    0.76 sec

The following tests passed:
    Phalanx_tViewOfViews_MPI_1

100% tests passed, 0 tests failed out of 1

Label Time Summary:
Phalanx    =   0.76 sec*proc (1 test)

Total Test time (real) =   0.80 sec
rppawlo commented 1 month ago

Thanks for the patch @dalg24 ! That's some neat code! I really want to preserve the safeguards. They have been important in the past with the app teams. I've got an alternate solution that seems to be working - it's up at https://github.com/trilinos/Trilinos/pull/13287 if you want to take a look. I will use some of your ideas in that patch in the future.

dalg24 commented 1 month ago

I think you could preserve the runtime check by doing something along these lines. phalanx_vov_v2.patch

dalg24 commented 1 month ago

FWIW it passes the tests from your PR. In my opinion, implementing all special operators like you did is higher risk to mess things up and more costly to maintain.

rppawlo commented 1 month ago

Agreed. I'm pulling in your changes and testing against some internal applications. Will do some further cleanup and push later today.

rppawlo commented 1 month ago

@ndellingwood - changes have been pushed into Trilinos develop if you want to test.

ndellingwood commented 1 month ago

Thanks @rppawlo ! After those changes, I'm seeing timeouts (cuda/11.2.2 build with UVM) in the Phalanx_tViewOfViews_MPI_1 unit test, I can post the configure for weaver if helpful. Sorry for the whack-a-mole :(

ndellingwood commented 1 month ago

The gcc/8.5.0 OpenMP build is back to passing

rppawlo commented 1 month ago

@ndellingwood - I've reproduced it. Working on a fix. These are the flags I used to enable UVM. Is this what you use?

-D Kokkos_ENABLE_CUDA_UVM=ON \
-D KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON \
-D Tpetra_ENABLE_CUDA_UVM=ON \
ndellingwood commented 1 month ago

@rppawlo yes to the Kokkos and KokkosKernels; for Tpetra I use -D Tpetra_ALLOCATE_IN_SHARED_SPACE=ON

ndellingwood commented 1 month ago

Linking #13300 for the Cuda+UVM PR

ndellingwood commented 1 month ago

@rppawlo I'm still seeing the Phalanx_tViewOfViews_MPI_1 timeout for Cuda+UVM integration builds after merge of #13300, here are some reproducer details from the job on Weaver:

Test output:

...
Running unit tests ...

0. PhalanxViewOfViews_ViewOfView_DefaultStreamInitialize_UnitTest ... [Passed] (0.0122 sec)
1. PhalanxViewOfViews_ViewOfView_DefaultStreamCtor_UnitTest ... [Passed] (0.0117 sec)
2. PhalanxViewOfViews_ViewOfView_UserStreamCtor_UnitTest ... Using partition_space, concurrency=163840
[Passed] (0.0125 sec)
3. PhalanxViewOfViews_ViewOfView_UserStreamInitialize_UnitTest ... Using partition_space, concurrency=163840
[Passed] (0.0124 sec)
4. PhalanxViewOfViews_ViewOfView_DefaultCtorDtor_UnitTest ... [Passed] (5.42e-06 sec)
5. PhalanxViewOfViews_WrapperExample_UnitTest ... [Passed] (0.0011 sec)

0% tests passed, 1 tests failed out of 1

Subproject Time Summary:
Phalanx    = 1500.12 sec*proc (1 test)

Total Test time (real) = 1501.07 sec

The following tests FAILED:
        813 - Phalanx_tViewOfViews_MPI_1 (Timeout)

Weaver reproducer (rhel8 queue): SHA: 665bf893489919d69b3a31fd5e3313e831dda117

# Repo prep
git clone -b develop https://github.com/trilinos/Trilinos.git
TRILINOS_DIR=$PWD/Trilinos
git clone -b develop https://github.com/kokkos/kokkos.git
KOKKOS_DIR=$PWD/kokkos
git clone -b develop https://github.com/kokkos/kokkos-kernels.git
KOKKOSKERNELS_DIR=$PWD/kokkos-kernels

cd $TRILINOS_DIR
ln -s ${KOKKOS_DIR}/kokkos kokkos
ln -s ${KOKKOSKERNELS_DIR}/kokkos-kernels kokkos-kernels
cd -

mkdir -p Build
cd Build

# Environment and configuration

export TRILINOS_DIR=<path-to-trilinos>

# Interactive compute node
bsub -Is -n 1 -q rhel8 -gpu "num=4" bash

source /etc/profile.d/modules.sh
source /projects/ppc64le-pwr9-rhel8/legacy-env.sh
module purge
module load git cmake/3.23.1 cuda/11.2.2/gcc/8.3.1 openmpi/4.1.1/gcc/8.3.1/cuda/11.2.2 openblas/0.3.18/gcc/8.3.1
module list
export OMPI_CXX=$KOKKOS_DIR/bin/nvcc_wrapper

cmake \
 -D CMAKE_CXX_COMPILER="`which mpicxx`" \
 -D CMAKE_C_COMPILER="`which mpicc`" \
 -D CMAKE_Fortran_COMPILER="`which mpifort`" \
 -D CMAKE_BUILD_TYPE:STRING=RELEASE \
 -D CMAKE_CXX_STANDARD=17 \
 -D BUILD_SHARED_LIBS=ON \
 -D CMAKE_INSTALL_PREFIX=$PWD/install \
 -D TPL_ENABLE_CUDA:STRING=ON \
 -D TPL_ENABLE_MPI:STRING=ON \
 -D MPI_BASE_DIR:PATH="$OPENMPI_ROOT" \
 -D MPI_BIN_DIR:PATH="$OPENMPI_BIN" \
 -D MPI_EXEC_POST_NUMPROCS_FLAGS:STRING="-map-by;socket:PE=4" \
-D TPL_ENABLE_BLAS:STRING=ON \
  -D BLAS_LIBRARY_DIRS:FILEPATH="$OPENBLAS_ROOT/lib" \
  -D BLAS_LIBRARY_NAMES:STRING="openblas" \
-D TPL_ENABLE_LAPACK:STRING=ON \
  -D LAPACK_INCLUDE_DIRS:FILEPATH="$OPENBLAS_ROOT/include" \
  -D LAPACK_LIBRARY_DIRS:FILEPATH="$OPENBLAS_ROOT/lib" \
  -D LAPACK_LIBRARY_NAMES:STRING="openblas" \
 -D Trilinos_ENABLE_COMPLEX=ON \
 -D Trilinos_ENABLE_TESTS=OFF \
 -D Trilinos_ENABLE_EXAMPLES=OFF \
 -D Trilinos_ENABLE_Kokkos=ON \
 -D Kokkos_ENABLE_CUDA=ON \
 -D Kokkos_ENABLE_CUDA_LAMBDA=ON \
 -D Kokkos_ENABLE_CUDA_UVM=ON \
 -D Kokkos_ARCH_VOLTA70=ON \
 -D Kokkos_ARCH_POWER9=ON \
 -D Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF \
 -D Kokkos_ENABLE_TESTS=ON \
 -D Kokkos_ENABLE_IMPL_MDSPAN=ON \
 -D Trilinos_ENABLE_KokkosKernels=ON \
 -D KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON \
 -D KokkosKernels_ENABLE_TPL_CUSOLVER=ON \
 -D KokkosKernels_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Tpetra=ON \
 -D Tpetra_ALLOCATE_IN_SHARED_SPACE=ON \
 -D Tpetra_INST_CUDA=ON \
 -D Tpetra_INST_SERIAL=ON \
 -D Tpetra_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Sacado=ON \
 -D Sacado_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Phalanx=ON \
 -D Phalanx_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Adelus=ON \
 -D Adelus_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Ifpack2=ON \
 -D Ifpack2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Amesos2=OFF \
 -D Trilinos_ENABLE_Stokhos=OFF \
 -D Trilinos_ENABLE_MueLu=OFF \
 -D Intrepid2_ENABLE_TESTS=OFF \
 -D Trilinos_ENABLE_ShyLU_NodeTacho=OFF \
 -D Kokkos_CoreUnitTest_Default_MPI_1_EXTRA_ARGS="--gtest_filter=-*defaultdevicetype.shared_space" \
  -DKokkos_SOURCE_DIR_OVERRIDE:STRING=kokkos \
  -DKokkosKernels_SOURCE_DIR_OVERRIDE:STRING=kokkos-kernels \
  -DTrilinos_ENABLE_INSTALLATION_TESTING=OFF \
$TRILINOS_DIR

(Most of the packages above can be disabled, just posting full config)

Edit: Adding the backtrace from gdb

[New Thread 0x200088df0890 (LWP 136065)]
0. PhalanxViewOfViews_ViewOfView_DefaultStreamInitialize_UnitTest ... [Passed] (0.0125 sec)
1. PhalanxViewOfViews_ViewOfView_DefaultStreamCtor_UnitTest ... [Passed] (0.0117 sec)
2. PhalanxViewOfViews_ViewOfView_UserStreamCtor_UnitTest ... Using partition_space, concurrency=163840
[Passed] (0.0124 sec)
3. PhalanxViewOfViews_ViewOfView_UserStreamInitialize_UnitTest ... Using partition_space, concurrency=163840
[Passed] (0.0123 sec)
4. PhalanxViewOfViews_ViewOfView_DefaultCtorDtor_UnitTest ... [Passed] (4.66e-06 sec)
5. PhalanxViewOfViews_WrapperExample_UnitTest ... [Passed] (0.0012 sec)

Thread 1 "Phalanx_tViewOf" received signal SIGINT, Interrupt.
0x0000200042d379f8 in __lll_lock_wait () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
(gdb) bt
#0  0x0000200042d379f8 in __lll_lock_wait () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
#1  0x0000200042d2ceec in pthread_mutex_lock () from /lib64/glibc-hwcaps/power9/libpthread-2.28.so
#2  0x0000200042810070 in __gthread_mutex_lock (__mutex=0x10781b00) at /usr/include/c++/8/ppc64le-redhat-linux/bits/gthr-default.h:748
#3  std::mutex::lock (this=0x10781b00) at /usr/include/c++/8/bits/std_mutex.h:103
#4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/8/bits/std_mutex.h:162
#5  Kokkos::Serial::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}::operator()() const (
    __closure=<optimized out>) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/Serial/Kokkos_Serial.hpp:155
#6  Kokkos::Tools::Experimental::Impl::profile_fence_event<Kokkos::Serial, Kokkos::Serial::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::Tools::Experimental::SpecialSynchronizationCases, Kokkos::Serial::impl_static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1} const&) (func=..., 
    reason=Kokkos::Tools::Experimental::GlobalDeviceSynchronization, name=...) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_Profiling.hpp:208
#7  Kokkos::Serial::impl_static_fence (name=...) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/Serial/Kokkos_Serial.hpp:147
#8  Kokkos::Impl::ExecSpaceDerived<Kokkos::Serial>::static_fence (this=<optimized out>, label=...)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_ExecSpaceManager.hpp:131
#9  0x00002000427e699c in Kokkos::Impl::ExecSpaceManager::static_fence (this=0x2000428a17c0 <Kokkos::Impl::ExecSpaceManager::get_instance()::space_initializer>, name=...)
    at /usr/include/c++/8/bits/unique_ptr.h:345
#10 0x00002000427e6e3c in (anonymous namespace)::fence_internal (name=...) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_Core.cpp:813
#11 0x00002000427e7108 in Kokkos::fence (name=...) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_Core.cpp:1099
#12 0x00002000427f0c20 in Kokkos::HostSpace::deallocate (this=0x11097068, arg_label=0x7fffffff8830 "a", arg_alloc_ptr=0x37b8de00, arg_alloc_size=168, arg_logical_size=40)
    at /usr/include/c++/8/bits/char_traits.h:287
#13 0x00002000427f11ec in Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::HostSpace>::~SharedAllocationRecordCommon (this=0x11097020, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/basic_string.h:2294
#14 0x0000000010033c88 in Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, void>::~SharedAllocationRecord (this=<optimized out>, __in_chrg=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/Kokkos_HostSpace.hpp:178
#15 Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double, true> >::~SharedAllocationRecord (this=0x11097020, __in_chrg=<optimized out>) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp:400
#16 Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double, true> >::~SharedAllocationRecord (this=0x11097020, __in_chrg=<optimized out>) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp:400
#17 0x000000001002cb84 in Kokkos::Impl::deallocate<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double, true> > (
    record_ptr=<optimized out>) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp:375
#18 0x00002000427fa3a4 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement (arg_record=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp:268
#19 0x0000000010065ff4 in Kokkos::Impl::SharedAllocationTracker::~SharedAllocationTracker (this=0x2000e0000c80, __in_chrg=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp:544
#20 Kokkos::Impl::ViewTracker<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace> >::~ViewTracker (this=0x2000e0000c80, __in_chrg=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_ViewTracker.hpp:39
#21 Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>::~View (this=0x2000e0000c80, __in_chrg=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/Kokkos_View.hpp:1279
#22 Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>::operator() (
    this=0x7fffffff8b70, i=0) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/View/Kokkos_ViewAlloc.hpp:80
#23 Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>, Kokkos::RangePolicy<Kokkos::Serial, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>::DestroyTag>, Kokkos::Serial>::exec<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>::DestroyTag> (this=0x7fffffff8b70)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/Serial/Kokkos_Serial_Parallel_Range.hpp:46
#24 Kokkos::Impl::ParallelFor<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>, Kokkos::RangePolicy<Kokkos::Serial, Kokkos::IndexType<long>, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>::DestroyTag>, Kokkos::Serial>::execute (this=0x7fffffff8b70)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/Serial/Kokkos_Serial_Parallel_Range.hpp:56
#25 Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>::parallel_for_implementation<Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>::DestroyTag> (this=0x37b8cbb8) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/View/Kokkos_ViewAlloc.hpp:174
#26 0x0000000010066404 in Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false>::destroy_shared_allocation (this=<optimized out>) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/View/Kokkos_ViewAlloc.hpp:184
#27 Kokkos::Impl::deallocate<Kokkos::CudaUVMSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, false> > (record_ptr=0x37b8cb60) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp:382
#28 0x00002000427fa3a4 in Kokkos::Impl::SharedAllocationRecord<void, void>::decrement (arg_record=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp:268
#29 0x0000000010031300 in Kokkos::Impl::SharedAllocationTracker::~SharedAllocationTracker (this=0x37b8c420, __in_chrg=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_SharedAlloc.hpp:544
#30 Kokkos::Impl::ViewTracker<Kokkos::View<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::Experimental::EmptyViewHooks> >::~ViewTracker (this=0x37b8c420, __in_chrg=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /home/ndellin/trilinos/Trilinos/kokkos/core/src/impl/Kokkos_ViewTracker.hpp:39
#31 Kokkos::View<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::Experimental::EmptyViewHooks>::~View (this=0x37b8c420, __in_chrg=<optimized out>) at /home/ndellin/trilinos/Trilinos/kokkos/core/src/Kokkos_View.hpp:1279
#32 PHX::details::ViewOfViewsDeleter::operator()<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::Experimental::EmptyViewHooks> (this=<optimized out>, vov=0x37b8c420) at /home/ndellin/trilinos/Trilinos/packages/phalanx/src/Phalanx_KokkosViewOfViews.hpp:77
#33 PHX::details::ViewOfViewsDeleter::operator()<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::Experimental::EmptyViewHooks> (this=<optimized out>, vov=0x37b8c420) at /home/ndellin/trilinos/Trilinos/packages/phalanx/src/Phalanx_KokkosViewOfViews.hpp:41
#34 std::_Sp_counted_deleter<Kokkos::View<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::Experimental::EmptyViewHooks>*, PHX::details::ViewOfViewsDeleter, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>)
    at /usr/include/c++/8/bits/shared_ptr_base.h:471
#35 0x000000001003c9ec in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x37b91910) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#36 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#37 0x00000000100282c8 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffff9048, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#38 std::__shared_ptr<Kokkos::View<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::Experimental::EmptyViewHooks>, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffff9040, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#39 std::shared_ptr<Kokkos::View<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace>, Kokkos::Experimental::EmptyViewHooks> >::~shared_ptr (this=0x7fffffff9040, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#40 PHX::ViewOfViews<1, Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::HostSpace>, Kokkos::CudaUVMSpace>::~ViewOfViews (this=0x7fffffff9040, __in_chrg=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/packages/phalanx/src/Phalanx_KokkosViewOfViews.hpp:126
#41 PhalanxViewOfViews_CreateHostHost_UnitTest::runUnitTestImpl (this=<optimized out>, out=..., success=@0x7fffffff9978: true)
    at /home/ndellin/trilinos/Trilinos/packages/phalanx/test/ViewOfViews/tPhalanxViewOfViews.cpp:459
#42 0x000020002e0c8844 in Teuchos::UnitTestBase::runUnitTest (this=<optimized out>, out=...)
    at /home/ndellin/trilinos/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestBase.cpp:29
#43 0x000020002e0ced64 in Teuchos::UnitTestRepository::runUnitTestImpl (unitTest=..., out=...)
    at /home/ndellin/trilinos/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:506
#44 0x000020002e0d037c in Teuchos::UnitTestRepository::runUnitTests (out=...) at /home/ndellin/trilinos/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:284
#45 0x000020002e0d1650 in Teuchos::UnitTestRepository::runUnitTestsFromMain (argc=<optimized out>, argv=<optimized out>)
    at /home/ndellin/trilinos/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:390
#46 0x000020002d6a797c in main (argc=<optimized out>, argv=<optimized out>) at /home/ndellin/trilinos/Trilinos/packages/phalanx/test/Utilities/Phalanx_UnitTestMain.cpp:29
#47 0x0000200043139f5c in generic_start_main.isra () from /lib64/glibc-hwcaps/power9/libc-2.28.so
#48 0x000020004313a0f4 in __libc_start_main () from /lib64/glibc-hwcaps/power9/libc-2.28.so
#49 0x0000000000000000 in ?? ()
rppawlo commented 1 month ago

Final fix was #13316

ndellingwood commented 1 month ago

The nightly Cuda+UVM job I referenced above is looking good, timeout resolve :)

Test #797: Phalanx_tKokkosViewOfViews_MPI_1 ............................................................   Passed    1.20 sec
...
Test #813: Phalanx_tViewOfViews_MPI_1 ..................................................................   Passed    1.46 sec