sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
282 stars 89 forks source link

Internal error building omega_h on attaway #1003

Closed ikalash closed 1 year ago

ikalash commented 1 year ago

There is a cryptic internal error encountered when building Omega_h on the attaway cluster:

[ 65%] Building CXX object src/CMakeFiles/omega_h.dir/Omega_h_swap2d.cpp.o
/projects/albany/nightlyCDash/build/TrilinosSerialInstall/include/Serial/Kokkos_Serial_Parallel_Range.hpp(37) (col. 16): internal error: 04010002_12295

compilation aborted for /ceeprojects/albany/nightlyCDash/build/AlbBuildSerial/tpls/omegah/Omega_h-prefix/src/Omega_h/src/Omega_h_shape.cpp (code 4)
gmake[5]: *** [src/CMakeFiles/omega_h.dir/build.make:1266: src/CMakeFiles/omega_h.dir/Omega_h_shape.cpp.o] Error 4
gmake[5]: *** Waiting for unfinished jobs....
gmake[4]: *** [CMakeFiles/Makefile2:128: src/CMakeFiles/omega_h.dir/all] Error 2
gmake[3]: *** [Makefile:146: all] Error 2
gmake[2]: *** [CMakeFiles/Omega_h.dir/build.make:86: Omega_h-prefix/src/Omega_h-stamp/Omega_h-build] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/Omega_h.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

CMake Error at cmake/GetOrInstallOmegah.cmake:115 (message):
  Die
Call Stack (most recent call first):
  CMakeLists.txt:753 (include)

The full output is attached.

@cwsmith , @bartgol : have you seen an error like this before?

This is when building omega_h as a part of Albany. I suppose I could try building it stand-alone to see if the same error is encountered, if you think it would be useful.

ikalash commented 1 year ago

Forgot the attachment. Here it is. nightly_log_attawayAlbanyIntelSerialBuild.txt

cwsmith commented 1 year ago

@ikalash I haven't seen this one before. ~What compiler (and version) is this build using?~ Nevermind. I see icc-19.0.5.281 in the build string.

I haven't built with Intel compilers (outside of the new llvm based one) in some time. I'll see if I can reproduce this.

bartgol commented 1 year ago

Ugh, unfortunately internal compiler errors are often nonsense. The fix sometimes involves rearranging some code, splitting code among separate cpp files, or removing seemingly innocuous (and perfectly legit) const qualifiers. All of this requires time (first, to narrow down the exact line in our code where the issue is, and then to find a workaround). @cwsmith are you going to try to find the problematic line(s)? Usually bisection (commenting out code) works well.

Attempting to use a different compiler or compiler version, if possible, sometimes also helps (though not always).

ikalash commented 1 year ago

@ikalash I haven't seen this one before. ~What compiler (and version) is this build using?~ Nevermind. I see icc-19.0.5.281 in the build string.

I haven't built with Intel compilers (outside of the new llvm based one) in some time. I'll see if I can reproduce this.

Thanks! We actually have another intel build which is using icc-2021.3.0. That one builds Omega_h just fine. So it may have something to do with the versions.

mperego commented 1 year ago

It's probably not worth it to spend time on Intel 19. Why don't we simply disable Omega_h on that build?

ikalash commented 1 year ago

I am happy to do this. Will do it now.