nwchemgit / nwchem

NWChem: Open Source High-Performance Computational Chemistry
http://nwchemgit.github.io
Other
501 stars 160 forks source link

NWChem build takes 10+ hours to complete, and it ignores parallelization options #959

Closed yurivict closed 1 month ago

yurivict commented 6 months ago

Describe the bugThe build time for the FreeBSD port regressed to many hours from ~1.5 hour for unknown reasons. It appears to spend a lot of time in bash scripts, and in perl.

Here is the relevant GNU Make issue that I've created: https://savannah.gnu.org/bugs/?65533

It also doesn't build in parallel - make ignores the flag -j8.

makefiles have some issues that cause gmake slowdown. Also: -jN is ignored, how to make it build in parallel?

Could you please try to build it with gmake-4.4 and see if there's a slowdown?

Version: 7.2.0

Describe settings used Environent: NWCHEM_TOP=/usr/ports/science/nwchem/work/nwchem-7.2.0-release/src/.. NWCHEM_MODULES=all NWCHEM_LONG_PATHS=Y NWCHEM_TARGET=LINUX64 USE_INTERNALBLAS=Y EXTERNAL_GA_PATH=/usr/local BLAS_SIZE=4 USE_64TO32=y USE_LIBXC=Y USE_MPI=Y PYTHONVERSION=3.9 NWCHEM_MODULES="all python" F77="gfortran13" F90="gfortran13" FC="gfortran13" FFLAGS="-O -Wl,-rpath=/usr/local/lib/gcc13" F90FLAGS="-O -Wl,-rpath=/usr/local/lib/gcc13" FCFLAGS="-Wl,-rpath=/usr/local/lib/gcc13" PERL_USE_UNSAFE_INC=1 XDG_DATA_HOME=/usr/ports/science/nwchem/work XDG_CONFIG_HOME=/usr/ports/science/nwchem/work XDG_CACHE_HOME=/usr/ports/science/nwchem/work/.cache HOME=/usr/ports/science/nwchem/work PATH=/usr/local/libexec/ccache:/usr/ports/science/nwchem/work/.bin:/home/yuri/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin PKG_CONFIG_LIBDIR=/usr/ports/science/nwchem/work/.pkgconfig:/usr/local/libdata/pkgconfig:/usr/local/share/pkgconfig:/usr/libdata/pkgconfig MK_DEBUG_FILES=no MK_KERNEL_SYMBOLS=no SHELL=/bin/sh NO_LINT=YES ADDR2LINE="/usr/local/bin/addr2line" AR="/usr/local/bin/ar" AS="/usr/local/bin/as" CPPFILT="/usr/local/bin/c++filt" GPROF="/usr/local/bin/gprof" LD="/usr/local/bin/ld" NM="/usr/local/bin/nm" OBJCOPY="/usr/local/bin/objcopy" OBJDUMP="/usr/local/bin/objdump" RANLIB="/usr/local/bin/ranlib" READELF="/usr/local/bin/readelf" SIZE="/usr/local/bin/size" STRINGS="/usr/local/bin/strings" PREFIX=/usr/local LOCALBASE=/usr/local CC="cc" CFLAGS="-O2 -pipe -fstack-protector-strong -fno-strict-aliasing " CPP="cpp" CPPFLAGS="" LDFLAGS=" -Wl,-rpath=/usr/local/lib/gcc13 -L/usr/local/lib/gcc13 -fstack-protector-strong " LIBS="" CXX="c++" CXXFLAGS="-O2 -pipe -fstack-protector-strong -fno-strict-aliasing " CCACHE_DIR="/tmp/.ccache" BSD_INSTALL_PROGRAM="install -s -m 555" BSD_INSTALL_LIB="install -s -m 0644" BSD_INSTALL_SCRIPT="install -m 555" BSD_INSTALL_DATA="install -m 0644" BSD_INSTALL_MAN="install -m 444"

FreeBSD 14

Attach log files Compilation proceeds very slowly, never ends. Here is the log before it was killed: https://freebsd.org/~yuri/nwchem-7.2.0_3.log

To Reproduce

  1. Steps to reproduce the behavior: Build with gmake-4.4.1
  2. Attach all the input files required to run: n/a

Expected behavior Build in a reasonable time.

edoapra commented 6 months ago

Any chance of having the log files posted?

yurivict commented 6 months ago

I attached the log into the original message. But the log doesn't show any problem, since GNU make now handles variable expansions differently, and fires up exponentially more sub-processes.

See the explanation from GNU make people here: https://savannah.gnu.org/bugs/?65533

edoapra commented 6 months ago

No sign of slowdowns on Fedora 39 that uses make 4.4.1

export USE_MPI=y
export USE_INTERNALBLAS=1
export BLAS_SIZE=8
export NWCHEM_MODULES=all
edoapra commented 6 months ago

What version of NWChem is used here? Is it really 7.2.0? Why aren't you using 7.2.2 ?

yurivict commented 6 months ago

Why aren't you using 7.2.2 ?

Because of some sort of conflict with GA. There is some error message. GNU Make maintainers identified some problems in the NWChem makefiles.

edoapra commented 6 months ago

Why aren't you using 7.2.2 ?

Because of some sort of conflict with GA. There is some error message. GNU Make maintainers identified some problems in the NWChem makefiles.

Could you post the details of this issue?

edoapra commented 6 months ago

FreeBSD 14.0 seems to have gmake 4.3 How did you install gmake 4.4?

yurivict commented 6 months ago

FreeBSD 14.0 seems to have gmake 4.3 How did you install gmake 4.4?

gmake-4.4.1 is the current gmake version on FreeBSD 14.0

If you installed with the 'quarterly' packages (in /etc/pkg/FreeBSD.conf) - you need to change this to 'latest' and 'pkg upgrade -f'

yurivict commented 6 months ago

The FreeBSD port builds with BLAS_SIZE=4 (this is probably worse than BLAS_SIZE=8, but anyway).

Maybe I am mistaken, but some sort of size conversions appear to be done extensively during the build (perl scripts and various shell commands are run a lot).

Wild guess, but maybe there is no slowdown for BLAS_SIZE=8, only for BLAS_SIZE=4?

jeffhammond commented 6 months ago

BLAS_SIZE=4 requires a Perl script transformation of every source file that contains a BLAS call. It takes forever.

Use BLAS_SIZE=8 and a compatible library to compile faster.

yurivict commented 6 months ago

It calls grep, awk, cut, wc, bash, etc a lot with BLAS_SIZE=8 too. Build is still very slow with gmake-4.4.1

In fact, it spends most time running grep, awk, cut, wc, bash, etc, and Fortran takes only a small fraction of time, making the build very slow.

edoapra commented 6 months ago

@yurivict Are you using the release tarballs for the FreeBSD builds? If this is the case, there is no need of the make 64_to_32 step since the source tarball has already been processed through make 64_to_32

yurivict commented 6 months ago

@edoapra No, GitHub tarball is used.

edoapra commented 6 months ago

Why?

edoapra commented 6 months ago

The current hotfix/release-7-2-0 branch compiles in a reasonable amount of time with FreeBSD 14.0 and make 4.4 (around 20 minutes for the 64_to_32 step and 30 minutes for the actual compilation) Issue https://github.com/nwchemgit/nwchem/issues/960 should be fixed in too. Could you give it a try and test it? https://github.com/nwchemgit/nwchem/tree/hotfix/release-7-2-0

curl -LJO https://github.com/nwchemgit/nwchem/tarball/hotfix/release-7-2-0/
tar xzf nwchemgit-nwchem-v7.2.2-release*gz
rm nwchemgit-nwchem-v7.2.2-release*gz
ln -sf nwchemgit-nwchem-* nwchem-7.2.2
cd nwchem-7.2.2

Keep in mind this is not a release tarball with the 64_to_32 processing, but just an automated github tarball

yurivict commented 6 months ago

I confirm that nwchem now builds faster. On my slow system the build has succeeded in 80 minutes.

Thank you!

edoapra commented 6 months ago

Thank you very much for the feedback. I might be able to get another patch release going some time in the not so distant future.