trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.21k stars 564 forks source link

Tpetra: TpetraCore_CrsMatrix_MatvecFence_MPI_4 failure with Hip backend, rocm/5.6.1 #13292

Closed ndellingwood closed 2 months ago

ndellingwood commented 3 months ago

Bug Report

@trilinos/tpetra

Description

The TpetraCore_CrsMatrix_MatvecFence_MPI_4 (introduced in PR #13165 ) is failing with the Hip backend and rocm/5.6.1:

23:30:39 0. CrsMatrix_int_longlong_double_Tpetra_KokkosCompat_KokkosHIPWrapperNode_MatvecFence_UnitTest ... 
23:30:39  FenceCounter::get_count_global(exec_space.name()) = 10 == expectedGlobalCount = 10 : passed
23:30:39  FenceCounter::get_count_instance(exec_space.name()) = 20 == expectedInstanceCount = 30 : FAILED ==> /home/jenkins/caraway-new/workspace/Trilinos_Caraway_Hip_Serial_Rocm5_6_1_MI210/Trilinos/packages/tpetra/core/test/CrsMatrix/CrsMatrix_MatvecFence.cpp:224
23:30:39  *** Teuchos::StackedTimer::report() - Remainder for a level will be ***
23:30:39  *** incorrect if a timer in the level does not exist on every rank  ***
23:30:39  *** of the MPI Communicator.                                        ***
23:30:39  TransferPerf: 0.391053 [1] {min=0.386056, max=0.396057, std dev=0.00576813} <2, 0, 0, 0, 0, 0, 0, 0, 0, 2>
23:30:39  [FAILED]  (0.406 sec) CrsMatrix_int_longlong_double_Tpetra_KokkosCompat_KokkosHIPWrapperNode_MatvecFence_UnitTest
23:30:39  Location: /home/jenkins/caraway-new/workspace/Trilinos_Caraway_Hip_Serial_Rocm5_6_1_MI210/Trilinos/packages/tpetra/core/test/CrsMatrix/CrsMatrix_MatvecFence.cpp:113
23:30:39  
23:30:39 1. CrsMatrix_int_longlong_longlong_Tpetra_KokkosCompat_KokkosHIPWrapperNode_MatvecFence_UnitTest ... 
23:30:39  FenceCounter::get_count_global(exec_space.name()) = 10 == expectedGlobalCount = 10 : passed
23:30:39  FenceCounter::get_count_instance(exec_space.name()) = 20 == expectedInstanceCount = 30 : FAILED ==> /home/jenkins/caraway-new/workspace/Trilinos_Caraway_Hip_Serial_Rocm5_6_1_MI210/Trilinos/packages/tpetra/core/test/CrsMatrix/CrsMatrix_MatvecFence.cpp:224
23:30:39  *** Teuchos::StackedTimer::report() - Remainder for a level will be ***
23:30:39  *** incorrect if a timer in the level does not exist on every rank  ***
23:30:39  *** of the MPI Communicator.                                        ***
23:30:39  TransferPerf: 0.0268144 [1] {min=0.0267613, max=0.0268428, std dev=3.66579e-05} <1, 0, 0, 0, 0, 0, 0, 1, 1, 1>
23:30:39  [FAILED]  (0.0271 sec) CrsMatrix_int_longlong_longlong_Tpetra_KokkosCompat_KokkosHIPWrapperNode_MatvecFence_UnitTest
23:30:39  Location: /home/jenkins/caraway-new/workspace/Trilinos_Caraway_Hip_Serial_Rocm5_6_1_MI210/Trilinos/packages/tpetra/core/test/CrsMatrix/CrsMatrix_MatvecFence.cpp:113
23:30:39  
23:30:39 
23:30:39 The following tests FAILED:
23:30:39     0. CrsMatrix_int_longlong_double_Tpetra_KokkosCompat_KokkosHIPWrapperNode_MatvecFence_UnitTest ... 
23:30:39     1. CrsMatrix_int_longlong_longlong_Tpetra_KokkosCompat_KokkosHIPWrapperNode_MatvecFence_UnitTest ... 
23:30:39 
23:30:39 Total Time: 0.434 sec
23:30:39 
23:30:39 Summary: total = 2, run = 2, passed = 0, failed = 2
23:30:39 
23:30:39 End Result: TEST FAILED

Steps to Reproduce

  1. SHA1: 5e347ceda75723f06c28a1ab677421967337d549
  2. Configuration (caraway testbed MI210 queue)

export TRILINOS_DIR=

module load python rocm/5.6.1 cmake openmpi/4.1.5 openblas/0.3.23 ninja/1.11.1 module list export OMPI_CXX=$ROCM_PATH/bin/hipcc export TPETRA_ASSUME_GPU_AWARE_MPI=1

CMake configuration

cmake \ -G"Ninja" \ -DCMAKE_INSTALL_PREFIX=$PWD/install \ -DCMAKE_CXX_STANDARD="17" \ -DCMAKE_CXX_COMPILER="which mpicxx" \ -DCMAKE_C_COMPILER="which mpicc" \ -DCMAKE_FORTRAN_COMPILER="which mpifort" \ -DCMAKE_BUILD_TYPE="RELEASE" \ -DBUILD_SHARED_LIBS=OFF \ \ -DTrilinos_ENABLE_ALL_PACKAGES=OFF \ -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES=OFF \ -DTrilinos_ENABLE_EXPLICIT_INSTANTIATION=ON \ -DTrilinos_ASSERT_MISSING_PACKAGES=OFF \ -DTrilinos_ALLOW_NO_PACKAGES=OFF \ -DTrilinos_ENABLE_OpenMP=OFF \ -DTrilinos_ENABLE_TESTS=ON \ \ -DTrilinos_ENABLE_Amesos2=ON \ -DAmesos2_ENABLE_SuperLU=OFF \ -DAmesos2_ENABLE_KLU2=ON \ -DTrilinos_ENABLE_Belos=ON \ -DTrilinos_ENABLE_Ifpack2=ON \ -DTrilinos_ENABLE_Kokkos=ON \ -DKokkos_ARCH_VEGA90A=ON \ -DKokkos_ENABLE_CUDA=OFF \ -DKokkos_ENABLE_HIP=ON \ -DKokkos_ENABLE_OPENMP=OFF \ -DTrilinos_ENABLE_KokkosKernels=ON \ -DTrilinos_ENABLE_MueLu=ON \ -DTrilinos_ENABLE_Tpetra=ON \ -DTpetra_ENABLE_CUDA=OFF \ -DTpetra_INST_HIP=ON \ -DTpetra_INST_SERIAL=OFF \ -DTpetra_INST_OPENMP=OFF \ -DTpetra_INST_DOUBLE=ON \ -DTrilinos_ENABLE_Gtest=ON \ -DTrilinos_ENABLE_Teuchos=ON \ -DTrilinos_ENABLE_Xpetra=ON \ -DTrilinos_ENABLE_Zoltan2=ON \ -DTrilinos_ENABLE_Panzer=ON \ -DTPL_ENABLE_BLAS=ON \ -D BLAS_LIBRARY_DIRS:FILEPATH="${OPENBLAS_ROOT}/lib" \ -D BLAS_LIBRARY_NAMES:STRING="openblas" \ -DTPL_ENABLE_LAPACK=ON \ -D LAPACK_INCLUDE_DIRS:FILEPATH="${OPENBLAS_ROOT}/include" \ -D LAPACK_LIBRARY_DIRS:FILEPATH="${OPENBLAS_ROOT}/lib" \ -D LAPACK_LIBRARY_NAMES:STRING="openblas" \ -DTPL_ENABLE_Netcdf=OFF \ -DTPL_ENABLE_MPI=ON \ -DMPI_USE_COMPILER_WRAPPERS=ON \ -DMPI_EXEC="mpirun" \ -DMPI_EXEC_NUMPROCS_FLAG="-np" \ -DMPI_EXEC_POST_NUMPROCS_FLAGS:STRING="-bind-to;none" \ \ $TRILINOS_DIR

make -j16

ctest -R TpetraCore_CrsMatrix_MatvecFence_MPI_4 -V

maartenarnst commented 3 months ago

Hi @cgcgcg and @ndellingwood,

@romintomasetti and I are also seeing this test fail in our HIP build.

It seems one origin of the issue might be this fence in the Cuda case:

It seems such a fence is not present in the corresponding (I think) HIP case:

cgcgcg commented 3 months ago

@maartenarnst Please feel free to open a PR with the adjusted count for HIP. I didn't have time yet to track down where the extra fence was, so thank you very much for doing that!

maartenarnst commented 3 months ago

Hi @cgcgcg. OK, done :) It is PR

ndellingwood commented 2 months ago

The test returned to passing state after @maartenarnst 's fix #13331 , thank you!

cgcgcg commented 2 months ago

@ndellingwood Thanks for reporting back! Can we close the issue then?