sxs-collaboration / spectre

SpECTRE is a code for multi-scale, multi-physics problems in astrophysics and gravitational physics.
https://spectre-code.org
Other
157 stars 188 forks source link

SolveXcts with debug symbols has high memory usage during compilation #6150

Open macedo22 opened 3 weeks ago

macedo22 commented 3 weeks ago

Related to #5472

When attempting to build SolveXcts on ocean with gcc 11.3.0 Release, compilation succeeds but attempting to link generates this error that repeats until the process is killed:

[100%] Linking CXX executable ../../../../bin/SolveXcts
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o(.debug_info+0xd): error: relocation overflow: reference to local symbol 10 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o(.debug_info+0x12): error: relocation overflow: reference to local symbol 10 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o(.debug_info+0x16): error: relocation overflow: reference to local symbol 10 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o(.debug_info+0x65): error: relocation overflow: reference to local symbol 10 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o(.debug_info+0x72): error: relocation overflow: reference to local symbol 10 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/conv-static.o
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)(.debug_info+0xd): error: relocation overflow: reference to local symbol 11 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)(.debug_info+0x12): error: relocation overflow: reference to local symbol 11 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)(.debug_info+0x16): error: relocation overflow: reference to local symbol 11 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)
/opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)(.debug_info+0x1f): error: relocation overflow: reference to local symbol 11 in /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/lib/libconverse.a(commitid.C.o)
CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o(.debug_info+0x1a4e): error: relocation overflow: reference to local symbol 19509 in CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o
CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o(.debug_info+0x1d36): error: relocation overflow: reference to local symbol 19509 in CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o
CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o(.debug_info+0x1d4f): error: relocation overflow: reference to local symbol 19509 in CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o
.
.
.

A temporary workaround to get things to build successfully is to use -D DEBUG_SYMBOLS=OFF in the cmake command in spectre_run_cmake, which also reduces peak RAM during compilation.

Memory usage during compiling and linking is very high when debug symbols are on (default setting for Release). Here is the memory usage and time for building with and without debug symbols:

Process     | Debug symbols | Peak RAM (GB) | Time (mm:ss)
-----------------------------------------------------------
Compiling     On                  46.7           22:32
              Off                 11.2           11:37
Linking       On (killed*)        15.1*          00:16*
              Off                  2.0           00:09

*killed when linking error starts

Size of SolveXcts.cpp.o with debug symbols:     4.69 GB
Size of SolveXcts.cpp.o without debug symbols:  0.45 GB

Here is the full command that was used to compile SolveXcts with debug symbols (-g):

/opt/ohpc/pub/compiler/gcc/11.3.0/bin/g++ -DBLAZE_BLAS_INCLUDE_FILE="<gsl/gsl_cblas.h>" -DBLAZE_BLAS_MODE=1 -DBLAZE_DEFAULT_STORAGE_ORDER=blaze::columnMajor -DBLAZE_MPI_PARALLEL_MODE=0 -DBLAZE_USE_ALWAYS_INLINE=1 -DBLAZE_USE_DEFAULT_INITIALIZATON=0 -DBLAZE_USE_PADDING=0 -DBLAZE_USE_SHARED_MEMORY_PARALLELIZATION=0 -DBLAZE_USE_SLEEF=0 -DBLAZE_USE_STREAMING=1 -DBLAZE_USE_STRONG_INLINE=1 -DBOOST_ALLOW_DEPRECATED_HEADERS -DBOOST_MULTI_ARRAY_TYPES_HPP -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_PROGRAM_OPTIONS_NO_LIB -DBOOST_SP_DISABLE_THREADS -DDISABLE_OPENBLAS_MULTITHREADING -DH5_BUILT_AS_DYNAMIC_LIB -DSPECTRE_CHARM_HAS_MAIN -DSPECTRE_USE_ALWAYS_INLINE=1 -DSolveXcts_EXPORTS -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200809L -I/home/alexmacedo/spectre/src -isystem /home/alexmacedo/spectre/build_develop/src -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/jemalloc-5.2.1-r63hempegqym5ahitm2ykc3x7x4wvkci/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/blaze-3.8-y7sgzzcyehq7nosikgmqoi57pf5g2rpv/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/gsl-2.7.1-sey3z3ovrt3f4gvelm6dels7elkugo7o/include -isystem /home/alexmacedo/spectre/external/brigand/include -isystem /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/hdf5-1.12.2-dly2yyupjmeqjt5qswpt6iwuun7gvml3/include -isystem /opt/ohpc/pub/mpi/openmpi-gnu11/4.1.4/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/boost-1.79.0-ck2cccntuidtdby5falg4dooac6c7ypj/include -isystem /opt/ohpc/pub/apps/libxsmm/1.16.1/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/yaml-cpp-0.7.0-a4rumor7dkzoxmila6ewami7fzvoggwl/include -O3 -DNDEBUG -fPIC -g -march=native -mno-avx512f -ftemplate-backtrace-limit=0 -fno-math-errno -freciprocal-math -W -Wall -Wcast-align -Wcast-qual -Wdisabled-optimization -Wextra -Wformat-nonliteral -Wformat-security -Wformat-y2k -Wformat=2 -Winvalid-pch -Wmissing-declarations -Wmissing-field-initializers -Wmissing-format-attribute -Wmissing-include-dirs -Wmissing-noreturn -Wno-mismatched-tags -Wno-non-template-friend -Wno-type-limits -Wnon-virtual-dtor -Wold-style-cast -Woverloaded-virtual -Wpacked -Wpedantic -Wpointer-arith -Wredundant-decls -Wshadow -Wsign-conversion -Wstack-protector -Wswitch-default -Wunreachable-code -Wwrite-strings -Wno-noexcept-type -fnon-call-exceptions -D "BLAZE_THROW(EXCEPTION)=" -pthread -std=c++20 -Winvalid-pch -include /home/alexmacedo/spectre/build_develop/CMakeFiles/SpectrePch.dir/cmake_pch.hxx -o CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o -c /home/alexmacedo/spectre/src/Elliptic/Executables/Xcts/SolveXcts.cpp

And here is the full command that was used to compile SolveXcts without debug symbols (-DDEBUG_SYMBOLS=OFF used in cmake command removes -g):

/opt/ohpc/pub/compiler/gcc/11.3.0/bin/g++ -DBLAZE_BLAS_INCLUDE_FILE="<gsl/gsl_cblas.h>" -DBLAZE_BLAS_MODE=1 -DBLAZE_DEFAULT_STORAGE_ORDER=blaze::columnMajor -DBLAZE_MPI_PARALLEL_MODE=0 -DBLAZE_USE_ALWAYS_INLINE=1 -DBLAZE_USE_DEFAULT_INITIALIZATON=0 -DBLAZE_USE_PADDING=0 -DBLAZE_USE_SHARED_MEMORY_PARALLELIZATION=0 -DBLAZE_USE_SLEEF=0 -DBLAZE_USE_STREAMING=1 -DBLAZE_USE_STRONG_INLINE=1 -DBOOST_ALLOW_DEPRECATED_HEADERS -DBOOST_MULTI_ARRAY_TYPES_HPP -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_PROGRAM_OPTIONS_NO_LIB -DBOOST_SP_DISABLE_THREADS -DDISABLE_OPENBLAS_MULTITHREADING -DH5_BUILT_AS_DYNAMIC_LIB -DSPECTRE_CHARM_HAS_MAIN -DSPECTRE_USE_ALWAYS_INLINE=1 -DSolveXcts_EXPORTS -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200809L -I/home/alexmacedo/spectre/src -isystem /home/alexmacedo/spectre/build_develop_no_debug_symbols/src -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/jemalloc-5.2.1-r63hempegqym5ahitm2ykc3x7x4wvkci/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/blaze-3.8-y7sgzzcyehq7nosikgmqoi57pf5g2rpv/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/gsl-2.7.1-sey3z3ovrt3f4gvelm6dels7elkugo7o/include -isystem /home/alexmacedo/spectre/external/brigand/include -isystem /opt/ohpc/pub/apps/charm_7.0.0_gnu11_022324/verbs-linux-x86_64-smp-gcc/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/hdf5-1.12.2-dly2yyupjmeqjt5qswpt6iwuun7gvml3/include -isystem /opt/ohpc/pub/mpi/openmpi-gnu11/4.1.4/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/boost-1.79.0-ck2cccntuidtdby5falg4dooac6c7ypj/include -isystem /opt/ohpc/pub/apps/libxsmm/1.16.1/include -isystem /opt/ohpc/pub/apps/spack2022/opt/spack/linux-centos7-broadwell/gcc-11.3.0/yaml-cpp-0.7.0-a4rumor7dkzoxmila6ewami7fzvoggwl/include -O3 -DNDEBUG -fPIC -march=native -mno-avx512f -ftemplate-backtrace-limit=0 -fno-math-errno -freciprocal-math -W -Wall -Wcast-align -Wcast-qual -Wdisabled-optimization -Wextra -Wformat-nonliteral -Wformat-security -Wformat-y2k -Wformat=2 -Winvalid-pch -Wmissing-declarations -Wmissing-field-initializers -Wmissing-format-attribute -Wmissing-include-dirs -Wmissing-noreturn -Wno-mismatched-tags -Wno-non-template-friend -Wno-type-limits -Wnon-virtual-dtor -Wold-style-cast -Woverloaded-virtual -Wpacked -Wpedantic -Wpointer-arith -Wredundant-decls -Wshadow -Wsign-conversion -Wstack-protector -Wswitch-default -Wunreachable-code -Wwrite-strings -Wno-noexcept-type -fnon-call-exceptions -D "BLAZE_THROW(EXCEPTION)=" -pthread -std=c++20 -Winvalid-pch -include /home/alexmacedo/spectre/build_develop_no_debug_symbols/CMakeFiles/SpectrePch.dir/cmake_pch.hxx -o CMakeFiles/SolveXcts.dir/SolveXcts.cpp.o -c /home/alexmacedo/spectre/src/Elliptic/Executables/Xcts/SolveXcts.cpp
nilsvu commented 3 weeks ago

Yes also on CaltechHPC I've had this problem and had to set DEBUG_SYMBOLS=OFF. I don't think the SolveXcts executable is doing anything unreasonable or goes beyond what we intend to support with the DataBox, so as you mentioned the underlying issue is #5472 and IMO it's fairly high priority to fix.