Closed vbrunini closed 1 year ago
@CamelliaDPG
I just finished a bisect of when the errors started showing up, it was with a kokkos integration:
9de1a873b29bece7517dc9b5bd0745dcd631930c is the first bad commit
commit 9de1a873b29bece7517dc9b5bd0745dcd631930c
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date: Mon Apr 18 12:00:38 2022 -0600
Snapshot of kokkos.git from commit f2a05d316596ac8a06fb4582bc2bc423033e4396
From repository at git@github.com:kokkos/kokkos.git
At commit:
commit f2a05d316596ac8a06fb4582bc2bc423033e4396
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date: Mon Apr 18 11:59:10 2022 -0600
Update master_history for Kokkos 3.6.00
@ndellingwood
@trilinos/intrepid2 Is anyone available today to start looking at this?
I have found a partial workaround by reverting Kokkos_DynRankView.hpp to the version from immediately before that Kokkos integration. That avoids the internal compiler error in ~30 of the Intrepid2 tests, as well as in MueLu_IntrepidPCoarsenFactory which is where we were originally seeing it in our Trilinos build. There is still one additional Intrepid2 test that triggers the issue even with that change though, but hopefully this helps narrow down potential causes of the issue a bit.
@ccober6 I am starting to look at this. Thanks, @vbrunini, for all the details!
@vbrunini Can you give me a bit more info on your build environment? Are you doing this on CEE? Which modules do you have loaded? Environment variables? (Just copying and pasting your config script, it looks like the mpicc is still invoking gcc for me, and it breaks during the test compile due to the CLI options.)
On CEE using the sierra-devel/intel module. Looking at my env I'm guessing you need:
declare -x I_MPI_CC="icc"
declare -x I_MPI_CXX="icpc"
declare -x I_MPI_F77="ifort"
declare -x I_MPI_F90="ifort"
declare -x I_MPI_OFI_PROVIDER="tcp"
declare -x I_MPI_ROOT="/projects/sierra/linux_rh7/SDK/compilers/intel/IntelOneAPI-2021.3.0/mpi/2021.3.0"
to get the MPI wrappers to call the intel compiler?
Thanks, @vbrunini. I'm now able to reproduce. (Maybe because I'm not in the sierra wg, I don't see the sierra-devel/intel module, but I replaced with sparc-dev/intel-2021.3.0_intelmpi-2021.3.0. Everything else I was able to use just as you specified.)
@vbrunini @CamelliaDPG I tested a build over night of Trilinos with SHA posted on Blake using the intel/2021.2.0 compiler devpack, OpenMP backend, SKX architecture, and it compiled successfully with SEACAS disabled, including intrepid2.
# Env
module load devpack/20210420/openmpi/4.0.5/intel/oneapi/2021.2.0
export OMP_NUM_THREADS=8
export ARCH_C_FLAG="-xCORE-AVX512 -mkl"
export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
# Top Level Configuration Options
TESTS=ON
EXAMPLES=ON
SHARED=ON
CUDA=OFF
OPENMP=ON
PTHREAD=OFF
SERIAL=ON
COMPLEX=ON
ENABLE_SHYLU="ON"
cmake \
-DCMAKE_INSTALL_PREFIX="${TRILINOS_INSTALL_DIR}" \
-DCMAKE_CXX_STANDARD="14" \
-D Trilinos_ENABLE_COMPLEX_DOUBLE=${COMPLEX} \
\
-D Kokkos_ARCH_SKX=ON \
-D CMAKE_CXX_FLAGS="-g" \
-D CMAKE_C_FLAGS="${ARCH_C_FLAG} -g" \
-D CMAKE_Fortran_FLAGS="${ARCH_C_FLAG} -g" \
-D CMAKE_EXE_LINKER_FLAGS="${ARCH_C_FLAG}" \
-D CMAKE_Fortran_COMPILER="mpif77" \
-D HAVE_CXX_PRAGMA_WEAK:BOOL=OFF \
-D CMAKE_AR:FILEPATH=/usr/bin/ar \
-D CMAKE_STRIP:FILEPATH=/usr/bin/strip \
-D CMAKE_RANLIB:FILEPATH=/usr/bin/ranlib \
-D Trilinos_ENABLE_EXPLICIT_INSTANTIATION:BOOL=ON \
-D Trilinos_ENABLE_DEBUG:BOOL=OFF \
\
-D Trilinos_ENABLE_INSTALL_CMAKE_CONFIG_FILES:BOOL=ON \
-D CMAKE_BUILD_TYPE:STRING=RELEASE \
-D CMAKE_VERBOSE_MAKEFILE:BOOL=OFF \
-D CMAKE_SKIP_RULE_DEPENDENCY=ON \
-D Trilinos_ENABLE_CHECKED_STL:BOOL=OFF \
-D Trilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=OFF \
-D BUILD_SHARED_LIBS:BOOL=${SHARED} \
-D DART_TESTING_TIMEOUT:STRING=500 \
-D Trilinos_WARNINGS_AS_ERRORS_FLAGS:STRING="" \
\
-D Trilinos_ENABLE_OpenMP=${OPENMP} \
-D TPL_ENABLE_CUDA=${CUDA} \
-D TPL_ENABLE_MPI=ON \
-D MPI_EXEC_POST_NUMPROCS_FLAGS:STRING="-bind-to;socket;-map-by;socket" \
-D TPL_ENABLE_BinUtils=OFF \
-D TPL_ENABLE_SuperLU=OFF \
-D TPL_SuperLU_LIBRARIES:STRING="${SUPERLU_ROOT}/lib/libsuperlu.a" \
-D TPL_SuperLU_INCLUDE_DIRS:STRING="${SUPERLU_ROOT}/include" \
-D TPL_ENABLE_BLAS=ON \
-D TPL_BLAS_LIBRARIES:PATH="${BLAS_LIBRARIES}" \
-D TPL_ENABLE_LAPACK=ON \
-D TPL_LAPACK_LIBRARIES:PATH="${LAPACK_LIBRARIES}" \
-D TPL_ENABLE_Boost=ON \
-D Boost_INCLUDE_DIRS:PATH="${BOOST_ROOT}/include" \
-D Boost_LIBRARY_DIRS:PATH="${BOOST_ROOT}/lib" \
-D TPL_ENABLE_BoostLib=ON \
-D BoostLib_INCLUDE_DIRS:PATH="${BOOST_ROOT}/include" \
-D BoostLib_LIBRARY_DIRS:PATH="${BOOST_ROOT}/lib" \
-D TPL_ENABLE_Netcdf=ON \
-D Netcdf_INCLUDE_DIRS:PATH="${NETCDF_ROOT}/include" \
-D Netcdf_LIBRARY_DIRS:PATH="${NETCDF_ROOT}/lib" \
-D TPL_Netcdf_LIBRARIES:PATH="${NETCDF_ROOT}/lib/libnetcdf.a;${HDF5_ROOT}/lib/libhdf5_hl.a;${HDF5_ROOT}/lib/libhdf5.a;${ZLIB_ROOT}/lib/libz.a;${PNETCDF_ROOT}/lib/libpnetcdf.a" \
-D TPL_Netcdf_PARALLEL:BOOL=ON \
-D TPL_ENABLE_HDF5=ON \
-D HDF5_INCLUDE_DIRS:PATH="${HDF5_ROOT}/include" \
-D TPL_HDF5_LIBRARIES:PATH="${HDF5_ROOT}/lib/libhdf5_hl.a;${HDF5_ROOT}/lib/libhdf5.a;${ZLIB_ROOT}/lib/libz.a" \
-D TPL_ENABLE_Zlib=ON \
-D Zlib_INCLUDE_DIRS:PATH="${ZLIB_ROOT}/include" \
-D TPL_Zlib_LIBRARIES:PATH="${ZLIB_ROOT}/lib/libz.a" \
-D TPL_ENABLE_DLlib=ON \
\
\
-D Trilinos_ENABLE_Amesos=ON \
-D Trilinos_ENABLE_Amesos2=ON \
-D Amesos2_ENABLE_TESTS:BOOL=${TESTS} \
-D Amesos2_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_ShyLU_NodeTacho=${ENABLE_SHYLU} \
-D ShyLU_NodeTacho_ENABLE_TESTS:BOOL=${TESTS} \
-D ShyLU_NodeTacho_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_Anasazi=ON \
-D Anasazi_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_AztecOO=ON \
-D Trilinos_ENABLE_Belos=ON \
-D Belos_ENABLE_TESTS:BOOL=${TESTS} \
-D Belos_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_Epetra=ON \
-D Trilinos_ENABLE_EpetraExt=ON \
-D Trilinos_ENABLE_Ifpack=ON \
-D Ifpack_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Ifpack2=ON \
-D Ifpack2_ENABLE_TESTS:BOOL=${TESTS} \
-D Ifpack2_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_Intrepid=ON \
-D Intrepid_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Intrepid_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Intrepid2=ON \
-D Intrepid2_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Intrepid2_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Isorropia=ON \
-D Isorropia_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Kokkos=ON \
-D Kokkos_ENABLE_SERIAL=${SERIAL} \
-D Kokkos_ENABLE_PTHREAD=${PTHREAD} \
-D Kokkos_ENABLE_OPENMP=${OPENMP} \
-D Kokkos_ENABLE_CUDA=${CUDA} \
-D Kokkos_ENABLE_CUDA_LAMBDA=${CUDA} \
-D Kokkos_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_KokkosKernels=ON \
-D KokkosKernels_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_ML=ON \
-D ML_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Moertel=OFF \
-D Moertel_ENABLE_EXAMPLES=OFF \
-D Trilinos_ENABLE_MueLu=ON \
-D MueLu_INST_DOUBLE_INT_LONGINT:BOOL=ON \
-D MueLu_ENABLE_TESTS:BOOL=${TESTS} \
-D MueLu_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D MueLu_ENABLE_Experimental:BOOL=OFF \
-D MueLu_ENABLE_Kokkos_Refactor:BOOL=ON \
-D MueLu_ENABLE_Epetra:BOOL=OFF \
-D Trilinos_ENABLE_NOX=ON \
-D NOX_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Panzer=ON \
-D Panzer_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Phalanx=ON \
-D Phalanx_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_ROL=ON \
-D ROL_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Sacado=ON \
-D Sacado_ENABLE_UNINIT:BOOL=ON \
-D Sacado_ENABLE_TESTS:BOOL=${TESTS} \
-D Sacado_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_Stokhos=ON \
-D Stokhos_ENABLE_TESTS:BOOL=${TESTS} \
-D Stokhos_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_Stratimikos=ON \
-D Stratimikos_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Thyra=ON \
-D Thyra_ENABLE_TESTS:BOOL=${TESTS} \
-D Trilinos_ENABLE_Tpetra=ON \
-D Tpetra_INST_SERIAL:BOOL=${SERIAL} \
-D Tpetra_INST_OPENMP:BOOL=${OPENMP} \
-D Tpetra_INST_PTHREAD:BOOL=${PTHREAD} \
-D Tpetra_INST_CUDA:BOOL=${CUDA} \
-D Tpetra_ENABLE_TESTS:BOOL=${TESTS} \
-D Tpetra_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_TrilinosCouplings=ON \
-D TrilinosCouplings_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Trilinos_ENABLE_Triutils=ON \
-D Trilinos_ENABLE_Xpetra=ON \
-D Xpetra_ENABLE_TESTS:BOOL=${TESTS} \
-D Xpetra_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
-D Xpetra_ENABLE_Experimental:BOOL=OFF \
-D Xpetra_ENABLE_Kokkos_Refactor:BOOL=ON \
-D Trilinos_ENABLE_Zoltan=ON \
-D Trilinos_ENABLE_Zoltan2=ON \
-D Zoltan2_ENABLE_TESTS:BOOL=${TESTS} \
-D Zoltan2_ENABLE_EXAMPLES:BOOL=${EXAMPLES} \
\
-D Trilinos_ENABLE_STKMesh:BOOL=ON \
-D Trilinos_ENABLE_STKSimd:BOOL=ON \
-D Trilinos_ENABLE_STKIO:BOOL=ON \
-D Trilinos_ENABLE_STKTransfer:BOOL=ON \
-D Trilinos_ENABLE_STKSearch:BOOL=ON \
-D Trilinos_ENABLE_STKUtil:BOOL=ON \
-D Trilinos_ENABLE_STKTopology:BOOL=ON \
-D Trilinos_ENABLE_STKClassic:BOOL=OFF \
-D Trilinos_ENABLE_SEACASExodus:BOOL=OFF \
-D Trilinos_ENABLE_SEACASEpu:BOOL=OFF \
-D Trilinos_ENABLE_SEACASExodiff:BOOL=OFF \
-D Trilinos_ENABLE_SEACASNemspread:BOOL=OFF \
-D Trilinos_ENABLE_SEACASNemslice:BOOL=OFF \
-D Trilinos_ENABLE_SEACASAprepro_lib:BOOL=OFF \
$TRILINOS_DIR
@vbrunini can you try dropping -march=broadwell -mtune=broadwell
from the CMAKE_C*_FLAGS line and adding -DKokkos_ARCH_BDW=ON
and see if that has any impact? Also as an experiment can you try changing the c++ standard to c++17?
I ran into an occurrence of this type of internal compiler error in prep testing for kokkos@3.7.00 with a similar build to above with intel/19 (https://github.com/kokkos/kokkos/issues/5290). In that case changing the c++ standard to c++17 allowed my compilation to complete. Something seems flaky with the intel compilers
Edit: Using c++17 helped me get past the first wave of ICE, running a build overnight I hit ICE within intrepid2; note this was preliminary testing with kokkos@3.7.00, the same config builds fine for me with Trilinos develop on Blake though this is not a matching system/env to what Victor posted
@ndellingwood I tried your suggestions on my Intrepid2-only build, and it still fails. Here's the full script:
module load sparc-dev/intel-2021.3.0_intelmpi-2021.3.0
declare -x I_MPI_CC="icc"
declare -x I_MPI_CXX="icpc"
declare -x I_MPI_F77="ifort"
declare -x I_MPI_F90="ifort"
declare -x I_MPI_OFI_PROVIDER="tcp"
declare -x I_MPI_ROOT="/projects/sierra/linux_rh7/SDK/compilers/intel/IntelOneAPI-2021.3.0/mpi/2021.3.0"
cmake \
-GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_C_COMPILER=/projects/sierra/linux_rh7/SDK/compilers/intel/IntelOneAPI-2021.3.0/mpi/2021.3.0/bin/mpicc \
-DCMAKE_CXX_COMPILER=/projects/sierra/linux_rh7/SDK/compilers/intel/IntelOneAPI-2021.3.0/mpi/2021.3.0/bin/mpicxx \
-DCMAKE_C_FLAGS:STRING="-O2 -ftemplate-depth-128 -finline-functions -w0 " \
-DCMAKE_CXX_FLAGS:STRING="-O2 -ftemplate-depth-128 -finline-functions -w0 " \
-DKokkos_ARCH_BDW=ON \
-DTrilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-DTrilinos_ENABLE_TESTS:BOOL=ON \
-DTrilinos_ENABLE_Intrepid2:BOOL=ON \
../
@vbrunini, from @ndellingwood's test it sounds like this probably does not affect 2021.2.0 -- probably a regression Intel introduced in 2021.3.0. Is it feasible for you to just use 2021.2.0?
@CamelliaDPG thanks for sharing your configure. Can you try one more build, this time without ninja? I ran into problems in the past with intel compilers and using ninja (even with N=1) and using "Unix Makefiles" as the generator helped me workaround the issue (and be able to build with parallelism). I don't have an active account on the CEE but will see if I can reproduce elsewhere
Edit: Adding reference to a kokkos issue where I documented issues with ICE and being able to workaround by dropping use of ninja - https://github.com/kokkos/kokkos/issues/4475
@ndellingwood That fails, too, unfortunately. (I dropped the -GNinja
line, and did make -j20
.)
I think the short answer on the feasibility of switching to 2021.2.0 is no, not easily. It doesn't look like we have a 2021.2.0 installation & build configuration setup on the relevant platforms we're hitting this on, and I think it would take some negotiation with a few other teams to switch. Maybe @nate945 can confirm.
@CamelliaDPG darn, was hoping for an easy config solution, thank you for testing it out
I also tried just now with make -j1
on the idea maybe it was a parallel-invocation trigger, but that fails, too. (It does look like the new configuration gets us past some of the failures -- building Intrepid2_unit-test_performance_DataCombination_DataCombinationPerformance
succeeds now; it didn't before.)
Regarding Intel 2021.2.0 agree with @vbrunini that this might be possible, but not easy. We haven't installed or evaluated that compiler so would be some lead time and feasibility dependent on if that compiler introduced some separate issue. Our next planned intel upgrade is to the LLVM compilers, this is maybe within a ~6 month timeframe. We have been planning to move to C++17 for some time. Currently this has been blocked by the CUDA11 upgrade. Though if C++17 helps for a permeant solution here that is something that we should be enabling in the fairly near time frame.
It looks like the failure is non-deterministic.
In an effort to isolate/understand the issue, I tried reverting to @vbrunini's original setup, and building Intrepid2_unit-test_performance_DataCombination_DataCombinationPerformance
. As I expected, this again failed. I tried duplicating this driver, with the idea that I would whittle things down to a minimal failing driver. I tried building the new driver, and this succeeded!
I then tried building DataCombinationPerformance
again, and this time, it succeeded. Maddening!
Maddening!
@CamelliaDPG yeah I hear that, dealing with ICE from intel compilers is rough
I have a build going with my config on Blake (only intrepid2 tests enabled) but with Kokkos_ARCH_BDW=ON
set (I can't run tests but can at least see if it compiles); this won't tell conclusively that use of intel/2021.2.0 would resolve the issue (too many differences in software stack, system, env etc.), but will at least add some useful extra info about whether to consider further pursing further with a change of compiler version. I'll update when the build ends
@CamelliaDPG I reproduced the error with intel/2021.2.0; in my prior builds I hadn't set OMPI_CXX=icpc etc. and clang was identified as the compiler (which didn't exhibit the internal compiler error), so intel/2021.2.0 isn't a path forward
The ICE does not occur with intel/19 builds
Similar errors (but in Panzer) previously reported in #10097
@CamelliaDPG I reproduced the error with intel/2021.2.0; in my prior builds I hadn't set OMPI_CXX=icpc etc. and clang was identified as the compiler (which didn't exhibit the internal compiler error), so intel/2021.2.0 isn't a path forward
Too bad! Thanks for trying that.
Same issues intel/2021.5.0
@CamelliaDPG can you try a test with dropping the Kokkos arch flag -DKokkos_ARCH_BDW=ON \
from your configure https://github.com/trilinos/Trilinos/issues/10806#issuecomment-1199812130 (no kokkos arch)?
I have a build where I dropped the arch specification and previous ICE are not showing up so far
@ndellingwood That worked!! Thanks!!
Here's a configure-and-build script, which succeeds for me:
module load sparc-dev/intel-2021.3.0_intelmpi-2021.3.0
declare -x I_MPI_CC="icc"
declare -x I_MPI_CXX="icpc"
declare -x I_MPI_F77="ifort"
declare -x I_MPI_F90="ifort"
declare -x I_MPI_OFI_PROVIDER="tcp"
declare -x I_MPI_ROOT="/projects/sierra/linux_rh7/SDK/compilers/intel/IntelOneAPI-2021.3.0/mpi/2021.3.0"
cmake \
-GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_C_COMPILER=/projects/sierra/linux_rh7/SDK/compilers/intel/IntelOneAPI-2021.3.0/mpi/2021.3.0/bin/mpicc \
-DCMAKE_CXX_COMPILER=/projects/sierra/linux_rh7/SDK/compilers/intel/IntelOneAPI-2021.3.0/mpi/2021.3.0/bin/mpicxx \
-DCMAKE_C_FLAGS:STRING="-O2 -ftemplate-depth-128 -finline-functions -w0 " \
-DCMAKE_CXX_FLAGS:STRING="-O2 -ftemplate-depth-128 -finline-functions -w0 " \
-DTrilinos_ENABLE_ALL_PACKAGES:BOOL=OFF \
-DTrilinos_ENABLE_TESTS:BOOL=ON \
-DTrilinos_ENABLE_Intrepid2:BOOL=ON \
../
ninja
@vbrunini, would you please try this in your setup?
I already know that works, that's why the title of the issue specifies it happens when including the "-march=broadwell -mtune=broadwell" flags, but we do want to have those flags enabled.
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE
label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE
.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.
This issue was closed due to inactivity for 395 days.
Bug Report
@trilinos/Intrepid2
Description
When building with intel-2021.3 and "-march=broadwell -mtune=broadwell" we are seeing files that depend on Intrepid2 fail to compile with an internal compiler error:
It looks like this can be reproduced just with the Intrepid2 tests enabled.
Steps to Reproduce