trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 565 forks source link

MueLu: MueLu_Driver_anisotropic_MPI_4 test failure with OpenMP backend, 2 threads #12615

Open ndellingwood opened 8 months ago

ndellingwood commented 8 months ago

Bug Report

@trilinos/muelu

Description

While testing a build of Trilinos on my laptop I came across a consistent failure of the MueLu_Driver_anisotropic_MPI_4 unit test with the OpenMP backend when run with 2 threads; it passed consistently with 1 and 4 threads.

The test failed to converge at the Belos Pseudo Block CG solver stage after the apparent Laplace2D setup (lots of output, I'll cut a couple snips below):

Output:

...
=================================================================================================================================

                                              TimeMonitor results over 4 processors

Timer Name                      Local time (num calls)    MinOverProcs    MeanOverProcs    MaxOverProcs    MeanOverCallCounts
---------------------------------------------------------------------------------------------------------------------------------
MueLu setup time (Laplace2D)    0.0142 (1)                0.0142 (1)      0.0142 (1)       0.0142 (1)      0.0142 (1)
=================================================================================================================================
Memory use after preconditioner setup (GB): 0.0683174

*******************************************************
***** Belos Iterative Solver:  Pseudo Block CG
***** Maximum Iterations: 200
***** Block Size: 1
***** Residual Test:
*****   Test 1 : Belos::StatusTestGenResNorm<>: (2-Norm Imp Res Vec) / (2-Norm Res0), tol = 1e-06
*******************************************************
Iter   0, [ 1] :    1.000000e+00
...
Iter 200, [ 1] :    1.100411e-06
Number of iterations performed for this solve: 200

ERROR:  Belos did not converge!

Steps to Reproduce

  1. SHA1: b2d7336bbb6be099d4ab7c0666a71be43e7bbfe1
  2. Configure script:
    
    # Tested with gcc/10.4, openmpi/4.1.1

cmake \ \ -G "Ninja" \ -D CMAKE_INSTALL_PREFIX:PATH="${PWD}/install" \ -D CMAKE_CXX_FLAGS:STRING="-g" \ -D CMAKE_CXX_STANDARD:STRING="17" \ -D CMAKE_BUILD_TYPE:STRING=RELEASE \ -D CMAKE_VERBOSE_MAKEFILE:BOOL=TRUE \ -D BUILD_SHARED_LIBS:BOOL=OFF \ -D Trilinos_VERBOSE_CONFIGURE:BOOL=OFF \ \ -D TPL_ENABLE_MPI:BOOL=ON \ -D CMAKE_CXX_COMPILER:FILEPATH="which mpicxx" \ -D CMAKE_C_COMPILER:FILEPATH="which mpicc" \ -D CMAKE_Fortran_COMPILER:FILEPATH="which mpifort" \ -D Trilinos_EXTRA_LINK_FLAGS="-lgfortran -lm" \ \ -D TPL_ENABLE_METIS:BOOL=ON \ -D METIS_INCLUDE_DIRS:PATH="${METIS_PATH}/include" \ -D METIS_LIBRARY_DIRS:PATH="${METIS_PATH}/lib" \ -D METIS_LIBRARY_NAMES:STRING="metis" \ -D TPL_ENABLE_ParMETIS:BOOL=ON \ -D ParMETIS_INCLUDE_DIRS:PATH="${PARMETIS_PATH}/include;${METIS_PATH}/include" \ -D ParMETIS_LIBRARY_DIRS:PATH="${PARMETIS_PATH}/lib" \ -D ParMETIS_LIBRARY_NAMES:STRING="parmetis" \ -D TPL_ENABLE_BLAS:STRING=ON \ -D BLAS_LIBRARY_DIRS:FILEPATH=${BLAS_PATH}/lib \ -D BLAS_LIBRARY_NAMES:STRING="openblas" \ -D TPL_ENABLE_LAPACK:STRING=ON \ -D LAPACK_INCLUDE_DIRS:FILEPATH="${LAPACK_PATH}/include" \ -D LAPACK_LIBRARY_DIRS:FILEPATH=${LAPACK_PATH}/lib \ -D LAPACK_LIBRARY_NAMES:STRING="openblas" \ -D TPL_ENABLE_SuperLU:BOOL=ON \ -D SuperLU_INCLUDE_DIRS:FILEPATH="${SUPERLU_PATH}/include" \ -D SuperLU_LIBRARY_DIRS:FILEPATH="${SUPERLU_PATH}/lib" \ -D SuperLU_LIBRARY_NAMES:STRING="superlu" \ -D TPL_ENABLE_UMFPACK:STRING=ON \ -D UMFPACK_INCLUDE_DIRS:PATH="${SS_PATH}/include" \ -D UMFPACK_LIBRARY_DIRS:PATH="${SS_PATH}/lib" \ -D UMFPACK_LIBRARY_NAMES:STRING="umfpack;suitesparseconfig;amd" \ -DTPL_ENABLE_Netcdf=OFF \ \ -D Trilinos_ENABLE_EXAMPLES:BOOL=OFF \ -D Trilinos_ENABLE_TESTS:BOOL=OFF \ -D Trilinos_ENABLE_COMPLEX_DOUBLE:BOOL=ON \ -D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF \ -D Trilinos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=ON \ \ -D Trilinos_ENABLE_OpenMP:BOOL=ON \ -D Trilinos_ENABLE_Kokkos:BOOL=ON \ -D Kokkos_ENABLE_OPENMP:BOOL=ON \ -D Kokkos_ENABLE_SERIAL:BOOL=ON \ -D Kokkos_ENABLE_EXAMPLES:BOOL=OFF \ -D Trilinos_ENABLE_KokkosKernels:BOOL=ON \ -D KokkosKernels_ENABLE_EXAMPLES:BOOL=ON \ -D Trilinos_ENABLE_Epetra:BOOL=OFF \ -D Epetra_ENABLE_EXAMPLES:BOOL=OFF \ -D Trilinos_ENABLE_EpetraExt:BOOL=OFF \ -D EpetraExt_ENABLE_EXAMPLES:BOOL=OFF \ -D Trilinos_ENABLE_Tpetra:BOOL=ON \ -D Tpetra_ENABLE_EXAMPLES:BOOL=OFF \ -D Tpetra_INST_SERIAL:BOOL=ON \ -D Trilinos_ENABLE_Xpetra:BOOL=ON \ -D Trilinos_ENABLE_ShyLU_NodeBasker:BOOL=ON \ -D Trilinos_ENABLE_ShyLU_NodeTacho:BOOL=ON \ -D Trilinos_ENABLE_Amesos2:BOOL=ON \ -D Amesos2_ENABLE_EXAMPLES:BOOL=ON \ -D Amesos2_ENABLE_KLU2:BOOL=ON \ -D Amesos2_ENABLE_Tacho:BOOL=ON \ -D Amesos2_ENABLE_ShyLU_NodeBasker:BOOL=ON \ -D Amesos2_ENABLE_UMFPACK:BOOL=ON \ -D Amesos2_ENABLE_SuperLU:BOOL=ON \ -D Trilinos_ENABLE_Sacado:BOOL=ON \ -D Trilinos_ENABLE_Stokhos:BOOL=ON \ -D Trilinos_ENABLE_Ifpack2:BOOL=ON \ -D Trilinos_ENABLE_Zoltan2:BOOL=ON \ -D Trilinos_ENABLE_Intrepid2:BOOL=ON \ -D Intrepid2_ENABLE_TESTS:BOOL=OFF \ -D Trilinos_ENABLE_Belos:BOOL=ON \ -D Trilinos_ENABLE_Anasazi:BOOL=ON \ -D Trilinos_ENABLE_Phalanx:BOOL=ON \ -D Trilinos_ENABLE_Panzer:BOOL=ON \ -D Trilinos_ENABLE_Compadre:BOOL=ON \ -D Trilinos_ENABLE_MueLu:BOOL=ON \ -D MueLu_ENABLE_TESTS:BOOL=ON \ -D Trilinos_ENABLE_SEACAS:BOOL=OFF \ ${TRILINOS_DIR}

github-actions[bot] commented 8 months ago

Automatic mention of the @trilinos/muelu team

cgcgcg commented 8 months ago

Duplicate of #12549?

ndellingwood commented 8 months ago

@cgcgcg looks like it is related, though there isn't indication of test failures in #12549. I suppose the test failure with two-threads can be mentioned in that issue to consolidate and close this one?

cgcgcg commented 8 months ago

@ndellingwood You're right. @jhux2 observed "only" 190 iterations which is still more than the 66 we would like to get. Seems that it's random though, and in your case it hit the maximum iteration limit of 200.

ndellingwood commented 8 months ago

@cgcgcg thanks for pointing that out. In case it is useful added info, I have consistently hit failures for the case with 2 threads, though this was when running the test on my laptop (maybe the fails wouldn't happen on a workstation). Also, I was using an OpenMP-enabled OpenBLAS for blas/lapack support, I'm not sure if that would contribute toward randomness or increased iteration counts?