Open ndellingwood opened 7 months ago
Automatic mention of the @trilinos/muelu team
Automatic mention of the @trilinos/muelu team
Updated to add MueLu_UnitTestsTpetra_kokkos_MPI_{1,4}
tests; all the subtest failures seem related to Tpetra::Details::DeepCopyCounter::get_count_different_space()
discrepancies
Hi @cgcgcg , unfortunately I am still seeing some failures in Cuda w/UVM builds after merge of #12866
MueLu_UnitTestsTpetra_MPI_1
....
20:22:35 2 = 2 == H->GetGlobalNumLevels() = 2 : passed
20:22:35 Tpetra::Details::DeepCopyCounter::get_count_different_space() = 20 == 37 = 37 : FAILED ==> /home/jenkins/weaver/workspace/KokkosEco_Trilinos_Weaver_CUDA112_opt-uvm/Trilinos/packages/muelu/test/unit_tests/Regression.cpp:109
20:22:35 Tpetra::Details::DeepCopyCounter::get_count_different_space() = 2 == 2 = 2 : passed
...
20:22:35 The following tests FAILED:
20:22:35 170. Regression_double_int_longlong_Tpetra_KokkosCompat_KokkosCudaWrapperNode_H2D_UnitTest ...
MueLu_UnitTestsTpetra_kokkos_MPI_1
...
20:22:45 2 = 2 == H->GetGlobalNumLevels() = 2 : passed
20:22:45 Tpetra::Details::DeepCopyCounter::get_count_different_space() = 27 == targetNumDeepCopies = 31 : FAILED ==> /home/jenkins/weaver/workspace/KokkosEco_Trilinos_Weaver_CUDA112_opt-uvm/Trilinos/packages/muelu/test/unit_tests_kokkos/Regression.cpp:119
20:22:45 Tpetra::Details::DeepCopyCounter::get_count_different_space() = 2 == 2 = 2 : passed
...
20:22:46 The following tests FAILED:
20:22:46 32. Regression_double_int_longlong_Tpetra_KokkosCompat_KokkosCudaWrapperNode_H2D_UnitTest ...
@ndellingwood Sorry about that. I totally only updated the counts for one of the regression tests. See #12874. However, it seems that we are still getting different deep_copy counts. This could either be a difference in how Trilinos is configured that's not being taken into account in the tests, or the new Kokkos release somehow uses less deep_copies.
@cgcgcg thanks for the update
This could either be a difference in how Trilinos is configured that's not being taken into account in the tests, or the new Kokkos release somehow uses less deep_copies.
The failures occurred with the existing (4.2.01) kokkos and kokkos-kernels packages in Trilinos, it was not unique to the release candidates
Bug Report
@trilinos/muelu
Description
The
MueLu_UnitTestsTpetra_MPI_1
andMueLu_UnitTestsTpetra_kokkos_MPI_{1,4}
tests are failing a couple checks in cuda/11.2 builds with UVM enabled:MueLu_UnitTestsTpetra_MPI_1 failing checks:
MueLu_UnitTestsTpetra_MPI_1 more details:
MueLu_UnitTestsTpetra_kokkos_MPI_1 failing checks:
MueLu_UnitTestsTpetra_kokkos_MPI_4 failing checks:
All the subtest failures seem related to
Tpetra::Details::DeepCopyCounter::get_count_different_space()
discrepanciesSteps to Reproduce
Interactive node
bsub -Is -n 1 -q rhel8 -gpu "num=4" bash
Environment
export TRILINOS_DIR=
export KOKKOS_PATH=$TRILINOS_DIR/packages/kokkos
export ATDM_CONFIG_REGISTER_CUSTOM_CONFIG_DIR=${TRILINOS_DIR}/cmake/std/atdm/contributed/weaver
source ${TRILINOS_DIR}/cmake/std/atdm/load-env.sh weaver-cuda-11.2-opt
export OMPI_CXX="$KOKKOS_PATH/bin/nvcc_wrapper"
Configure
cmake \ -DCMAKE_CXX_FLAGS='-g' \ -DCMAKE_CXX_STANDARD="17" \ -DCMAKE_INSTALL_PREFIX=$PWD/install \ -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \ -DTrilinos_ENABLE_COMPLEX_DOUBLE=ON \ -DTrilinos_ENABLE_TESTS=OFF \ -DTrilinos_ENABLE_ALL_PACKAGES=OFF \ -DTPL_ENABLE_CUSPARSE:BOOL=ON \ \ -D Trilinos_ENABLE_Kokkos=ON \ -D Kokkos_ARCH_VOLTA70=ON \ -D Kokkos_ARCH_POWER9=ON \ -D Kokkos_ENABLE_CUDA=ON \ -D Kokkos_ENABLE_CUDA_LAMBDA=ON \ -D Kokkos_ENABLE_CUDA_UVM=ON \ -D KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON \ -D Tpetra_ALLOCATE_IN_SHARED_SPACE=ON \ -D Trilinos_ENABLE_Sacado=ON \ -D Trilinos_ENABLE_Phalanx=ON \ -D Trilinos_ENABLE_Ifpack2=ON \ -D Trilinos_ENABLE_MueLu=ON \ -D MueLu_ENABLE_TESTS=ON \ \ $TRILINOS_DIR