trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 563 forks source link

Intrepid2: PYR_QuadFace_newBasis_Serial_DOUBLE, PYR_TriFace_newBasis_Serial_DOUBLE unit test failures with intel/2023.2.0 and mkl tpl #12390

Open ndellingwood opened 1 year ago

ndellingwood commented 1 year ago

Bug Report

@trilinos/intrepid2

Description

The following Intrepid2 unit tests fail for me in Serial and OpenMP builds using intel/2023.2.0 (icpc) with mkl:

Intrepid2_unit-test_Orientation_test_orientation_PYR_QuadFace_newBasis_Serial_DOUBLE

322: Considering Pyr 0: [ 0 1 2 3 4 ] and Pyr 1: [ 0 1 2 3 5 ]
322:  Intrepid2_IntegratedLegendreBasis_HGRAD_PYR
322:  Intrepid2_HierarchicalBasis_HDIV_PYR
322:
322: Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELS.
322:
322: Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELS.
322:                                                       ^^^^----FAILURE!
322: LAPACK error flag for cell 0 is: -5
322:                                                       ^^^^----FAILURE!
322: LAPACK error flag for cell 1 is: -5
322: Considering Pyr 0: [ 0 1 2 4 3 ] and Pyr 1: [ 0 1 2 4 5 ]
322:  Intrepid2_IntegratedLegendreBasis_HGRAD_PYR
322:  Intrepid2_HierarchicalBasis_HDIV_PYR
322:
322: Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELS.
322:
322: Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELS.
322:                                                       ^^^^----FAILURE!
322: LAPACK error flag for cell 0 is: -5
322:                                                       ^^^^----FAILURE!
322: LAPACK error flag for cell 1 is: -5
...

Intrepid2_unit-test_Orientation_test_orientation_PYR_TriFace_newBasis_Serial_DOUBLE

323: Considering Pyr 0: [ 2 3 0 1 4 ] and Pyr 1: [ 5 6 3 2 4 ]
323:  Intrepid2_IntegratedLegendreBasis_HGRAD_PYR
323:  Intrepid2_HierarchicalBasis_HDIV_PYR
323:
323: Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELS.
323:
323: Intel MKL ERROR: Parameter 5 was incorrect on entry to DGELS.
323:                                                       ^^^^----FAILURE!
323: LAPACK error flag for cell 0 is: -5
323:                                                       ^^^^----FAILURE!
323: LAPACK error flag for cell 1 is: -5
323:                                                       ^^^^----FAILURE!
323: Function DOFs on common face computed using Hex 0 basis functions are not consistent with those computed using Hex 1
323: Function DOFs for Hex 0 are: 5.125 5.23474 5.34543 5.45742 5.57104 5.68666
323: Function DOFs for Hex 1 are: 4.125 4.13334 4.13382 4.13655 4.15161 4.18911
323:                                                       ^^^^----FAILURE!
...

Steps to Reproduce

  1. SHA1: develop branch, e.g. 49bcc05cf3a6aa5a0bf2495d7101bd191a9b1283
  2. Configuration: Blake "all" queue
    
    module purge
    module load cmake intel-oneapi-compilers/2023.2.0 intel-oneapi-mkl/2023.2.0

export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a" export LAPACK_LIBRARIES=${BLAS_LIBRARIES}

cmake \ -D CMAKE_CXX_COMPILER="which icpc" \ -D CMAKE_C_COMPILER="which icc" \ -D CMAKE_Fortran_COMPILER="which ifort" \ -D CMAKE_CXX_FLAGS="-g -no-ip" \ -D CMAKE_C_FLAGS="-g -no-ip" \ -DTPL_ENABLE_MPI=OFF \ -DTPL_ENABLE_BLAS:BOOL=ON \ -DTPL_BLAS_LIBRARIES:PATH="${BLAS_LIBRARIES}" \ -DTPL_LAPACK_LIBRARIES:PATH="${LAPACK_LIBRARIES}" \ -DTPL_ENABLE_LAPACK:BOOL=ON \ -DTrilinos_ENABLE_ALL_PACKAGES=OFF \ -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES=OFF \ -DTrilinos_ENABLE_TESTS=OFF \ -DTrilinos_MUST_FIND_ALL_TPL_LIBS=TRUE \ -DTrilinos_ENABLE_OpenMP=ON \ -DTrilinos_ENABLE_Kokkos=ON \ -D Kokkos_ENABLE_OPENMP=ON \ -D Kokkos_ARCH_SKX=ON \ -DTrilinos_ENABLE_KokkosKernels=ON \ -DTrilinos_ENABLE_Tpetra=ON \ -DTrilinos_ENABLE_Sacado=ON \ -DTrilinos_ENABLE_Intrepid2=ON \ -D Intrepid2_ENABLE_TESTS=ON \ \ -DTPL_ENABLE_Matio=OFF \ \ $TRILINOS_DIR

CamelliaDPG commented 1 year ago

@ndellingwood thanks for the report. I've requested a blake account; will try to reproduce once that's approved.

CamelliaDPG commented 1 year ago

@ndellingwood What is $MKL_ROOT in your environment? It does not appear to be defined by the loaded modules.

CamelliaDPG commented 1 year ago

@ndellingwood please disregard.

CamelliaDPG commented 1 year ago

@ndellingwood, I'm getting link errors on blake. Not sure why; the libraries do appear to be there. Here's the output:

[6/6] Linking CXX executable packages/intrepid2/unit-test/Orientation/Intrep...it-test_Orientation_test_orientation_PYR_QuadFace_newBasis_Serial_DOUBLE.exe
FAILED: packages/intrepid2/unit-test/Orientation/Intrepid2_unit-test_Orientation_test_orientation_PYR_QuadFace_newBasis_Serial_DOUBLE.exe 
: && /projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.2.0/gcc/8.5.0/base/2lqdmcu/compiler/2023.2.0/linux/bin/intel64/icpc -g -no-ip -xCORE-AVX512 -qopenmp  -O3 -DNDEBUG -DKOKKOS_DEPENDENCE -xCORE-AVX512 -qopenmp -mkl packages/intrepid2/unit-test/Orientation/CMakeFiles/Intrepid2_unit-test_Orientation_test_orientation_PYR_QuadFace_newBasis_Serial_DOUBLE.dir/test_orientation_PYR_QuadFace_newBasis_Serial_DOUBLE.cpp.o -o packages/intrepid2/unit-test/Orientation/Intrepid2_unit-test_Orientation_test_orientation_PYR_QuadFace_newBasis_Serial_DOUBLE.exe  packages/intrepid2/src/libintrepid2.a  packages/shards/src/libshards.a  packages/sacado/src/libsacado.a  packages/teuchos/numerics/src/libteuchosnumerics.a  packages/teuchos/kokkoscomm/src/libteuchoskokkoscomm.a  packages/teuchos/comm/src/libteuchoscomm.a  packages/teuchos/kokkoscompat/src/libteuchoskokkoscompat.a  packages/teuchos/parameterlist/src/libteuchosparameterlist.a  packages/teuchos/parser/src/libteuchosparser.a  packages/teuchos/core/src/libteuchoscore.a  packages/kokkos-kernels/libkokkoskernels.a  packages/kokkos/simd/src/libkokkossimd.a  packages/kokkos/algorithms/src/libkokkosalgorithms.a  packages/kokkos/containers/src/libkokkoscontainers.a  packages/kokkos/core/src/libkokkoscore.a  /usr/lib64/libdl.so  /projects/x86-64-icelake-rocky8/tpls/intel-oneapi-mkl/2023.2.0/oneapi/2023.2.0/base/nehcvgn/mkl/2023.2.0/lib/intel64/libmkl_intel_lp64.a  /projects/x86-64-icelake-rocky8/tpls/intel-oneapi-mkl/2023.2.0/oneapi/2023.2.0/base/nehcvgn/mkl/2023.2.0/lib/intel64/libmkl_intel_thread.a  /projects/x86-64-icelake-rocky8/tpls/intel-oneapi-mkl/2023.2.0/oneapi/2023.2.0/base/nehcvgn/mkl/2023.2.0/lib/intel64/libmkl_core.a  /projects/x86-64-icelake-rocky8/tpls/intel-oneapi-mkl/2023.2.0/oneapi/2023.2.0/base/nehcvgn/mkl/2023.2.0/lib/intel64/libmkl_intel_lp64.a  /projects/x86-64-icelake-rocky8/tpls/intel-oneapi-mkl/2023.2.0/oneapi/2023.2.0/base/nehcvgn/mkl/2023.2.0/lib/intel64/libmkl_intel_thread.a  /projects/x86-64-icelake-rocky8/tpls/intel-oneapi-mkl/2023.2.0/oneapi/2023.2.0/base/nehcvgn/mkl/2023.2.0/lib/intel64/libmkl_core.a && :
icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
icpc: command line warning #10121: overriding '-xCORE-AVX512' with '-xCORE-AVX512'
icpc: command line remark #10412: option '-mkl' is deprecated and will be removed in a future release. Please use the replacement option '-qmkl'
ld: cannot find -lmkl_intel_lp64
ld: cannot find -lmkl_intel_thread
ld: cannot find -lmkl_core

Any idea what I might be missing?

ndellingwood commented 1 year ago

@CamelliaDPG I'm not sure, did you configure and build on the compute node?

CamelliaDPG commented 1 year ago

@CamelliaDPG I'm not sure, did you configure and build on the compute node?

Ah, I think that's exactly what I'm missing: I'm on the login node. How do I get to a compute node on blake?

ndellingwood commented 1 year ago

@CamelliaDPG salloc -N 1 -p all I should have added that to the reproducer, sorry about that

CamelliaDPG commented 1 year ago

@ndellingwood, whether I compile on login or compute node, I get the same link errors. I've tried manual futzing with the command line for the build/link, but haven't made any progress there. Would you mind putting together a full reproducer that includes the make and execute steps? Sorry to require the extra hand-holding here.

ndellingwood commented 1 year ago

@CamelliaDPG I reran a clean build to confirm it completed without issue. Can you confirm there are no additional modules loaded or env vars auto-set at login for example in your bashrc, just in case something there may be causing environment conflicts of some sort?

Here were my steps:

Blake testbed:

salloc -N 1 -p all

module purge
module load cmake intel-oneapi-compilers/2023.2.0 intel-oneapi-mkl/2023.2.0

./configure.sh

make -j16

ctest -R Intrepid2_unit-test_Orientation_test_orientation_PYR_QuadFace_newBasis_Serial_DOUBLE -V

ctest -R Intrepid2_unit-test_Orientation_test_orientation_PYR_TriFace_newBasis_Serial_DOUBLE -V

where my configure file is like before: configure.sh:

export TRILINOS_DIR=<point-to-source>
module purge
module load cmake intel-oneapi-compilers/2023.2.0 intel-oneapi-mkl/2023.2.0

export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
cmake \
  -D CMAKE_CXX_COMPILER="`which icpc`" \
  -D CMAKE_C_COMPILER="`which icc`" \
  -D CMAKE_Fortran_COMPILER="`which ifort`" \
  -D CMAKE_CXX_FLAGS="-g -no-ip" \
  -D CMAKE_C_FLAGS="-g -no-ip" \
  -DTPL_ENABLE_MPI=OFF \
  -DTPL_ENABLE_BLAS:BOOL=ON \
  -DTPL_BLAS_LIBRARIES:PATH="${BLAS_LIBRARIES}" \
  -DTPL_LAPACK_LIBRARIES:PATH="${LAPACK_LIBRARIES}" \
  -DTPL_ENABLE_LAPACK:BOOL=ON \
  -DTrilinos_ENABLE_ALL_PACKAGES=OFF \
  -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES=OFF \
  -DTrilinos_ENABLE_TESTS=OFF \
  -DTrilinos_MUST_FIND_ALL_TPL_LIBS=TRUE \
  -DTrilinos_ENABLE_OpenMP=ON \
  -DTrilinos_ENABLE_Kokkos=ON \
  -D Kokkos_ENABLE_OPENMP=ON \
   -D Kokkos_ENABLE_TESTS=OFF \
  -D Kokkos_ARCH_SKX=ON \
  -DTrilinos_ENABLE_KokkosKernels=ON \
   -D KokkosKernels_ENABLE_TESTS=OFF \
  -DTrilinos_ENABLE_Tpetra=ON \
   -D Tpetra_ENABLE_TESTS=OFF \
  -DTrilinos_ENABLE_Sacado=ON \
   -D Sacado_ENABLE_TESTS=OFF \
  -DTrilinos_ENABLE_Intrepid2=ON \
   -D Intrepid2_ENABLE_TESTS=ON \
\
  -DTPL_ENABLE_Matio=OFF \
\
$TRILINOS_DIR

Hopefully that helps

Edit: added the ctest -R lines for the test fails

CamelliaDPG commented 1 year ago

@ndellingwood Thanks for the additional details. It looks like I did have something wrong with my environment; when I deleted the crufty .bash* files I had sitting there, I was able to load modules on the compute node. I wasn’t able to before; I wasn’t sure whether that was expected to work. I’m optimistic that that will take care of it, so that I can reproduce. I’m off now for the weekend, but should be able to investigate further on Monday.

From: Nathan Ellingwood @.> Date: Thursday, October 12, 2023 at 4:58 PM To: trilinos/Trilinos @.> Cc: Roberts, Nathan V. @.>, Mention @.> Subject: [EXTERNAL] Re: [trilinos/Trilinos] Intrepid2: PYR_QuadFace_newBasis_Serial_DOUBLE, PYR_TriFace_newBasis_Serial_DOUBLE unit test failures with intel/2023.2.0 and mkl tpl (Issue #12390)

@CamelliaDPGhttps://github.com/CamelliaDPG I reran a clean build to confirm it completed without issue. Can you confirm there are no additional modules loaded or env vars auto-set at login for example in your bashrc, just in case something there may be causing environment conflicts of some sort?

Here were my steps:

Blake testbed:

salloc -N 1 -p all

module purge

module load cmake intel-oneapi-compilers/2023.2.0 intel-oneapi-mkl/2023.2.0

./configure.sh

make -j16

where my configure file is like before: configure.sh:

export TRILINOS_DIR=

module purge

module load cmake intel-oneapi-compilers/2023.2.0 intel-oneapi-mkl/2023.2.0

export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"

export LAPACK_LIBRARIES=${BLAS_LIBRARIES}

cmake \

-D CMAKE_CXX_COMPILER="which icpc" \

-D CMAKE_C_COMPILER="which icc" \

-D CMAKE_Fortran_COMPILER="which ifort" \

-D CMAKE_CXX_FLAGS="-g -no-ip" \

-D CMAKE_C_FLAGS="-g -no-ip" \

-DTPL_ENABLE_MPI=OFF \

-DTPL_ENABLE_BLAS:BOOL=ON \

-DTPL_BLAS_LIBRARIES:PATH="${BLAS_LIBRARIES}" \

-DTPL_LAPACK_LIBRARIES:PATH="${LAPACK_LIBRARIES}" \

-DTPL_ENABLE_LAPACK:BOOL=ON \

-DTrilinos_ENABLE_ALL_PACKAGES=OFF \

-DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES=OFF \

-DTrilinos_ENABLE_TESTS=OFF \

-DTrilinos_MUST_FIND_ALL_TPL_LIBS=TRUE \

-DTrilinos_ENABLE_OpenMP=ON \

-DTrilinos_ENABLE_Kokkos=ON \

-D Kokkos_ENABLE_OPENMP=ON \

-D Kokkos_ENABLE_TESTS=OFF \

-D Kokkos_ARCH_SKX=ON \

-DTrilinos_ENABLE_KokkosKernels=ON \

-D KokkosKernels_ENABLE_TESTS=OFF \

-DTrilinos_ENABLE_Tpetra=ON \

-D Tpetra_ENABLE_TESTS=OFF \

-DTrilinos_ENABLE_Sacado=ON \

-D Sacado_ENABLE_TESTS=OFF \

-DTrilinos_ENABLE_Intrepid2=ON \

-D Intrepid2_ENABLE_TESTS=ON \

\

-DTPL_ENABLE_Matio=OFF \

\

$TRILINOS_DIR

Hopefully that helps

— Reply to this email directly, view it on GitHubhttps://github.com/trilinos/Trilinos/issues/12390#issuecomment-1760480796, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAMYMAQIFDQC424TCHBFYXTX7BYXFANCNFSM6AAAAAA527I5BM. You are receiving this because you were mentioned.Message ID: @.***>

github-actions[bot] commented 2 days ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.