nwchemgit / nwchem

NWChem: Open Source High-Performance Computational Chemistry
http://nwchemgit.github.io
Other
513 stars 162 forks source link

Many tests fail #1015

Open yurivict opened 2 months ago

yurivict commented 2 months ago

Describe the bug

In the test log there are many failures:

 Running tests/esp_uhf/esp_uhf

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... 0:ga_iter_lsolve: dgesv failed:Received an Error in Communication
Abort(0) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 0) - process 0
failed
@@@     Comparison of Output Files
@@ -1,2 +1 @@                                                    
 Effective nuclear repulsion energy (a.u.) 107.60 
-Total SCF energy = -476.73491

Failed
 Running tests/bsse_tce_mult/bsse_tce_mult

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... 0:tce_diis: LU decomposition failed:Received an Error in Communication
Abort(0) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 0) - process 0
failed
@@@     Comparison of Output Files
@@ -1,13 +1,2 @@
 Effective nuclear repulsion energy (a.u.) 20.88
 Total SCF energy = -39.76410
-CCSD total energy / hartree = -39.9580035
-Effective nuclear repulsion energy (a.u.) 0.00
-Effective nuclear repulsion energy (a.u.) 0.00
-Total SCF energy = -39.34734
-CCSD total energy / hartree = -39.5301286
-Total SCF energy = -39.34986
-CCSD total energy / hartree = -39.5342256
-Total SCF energy = -0.49928
-CCSD total energy / hartree = -0.4992784
-Total SCF energy = -0.49931
-CCSD total energy / hartree = -0.4993071

Failed
 Running tests/sad_ch3hf/sad_ch3hf

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... 0:ga_iter_lsolve: dgesv failed:Received an Error in Communication
Abort(0) on node 0 (rank 0 in comm 496): application called MPI_Abort(comm=0x84000000, 0) - process 0
failed
@@@     Comparison of Output Files
@@ -1,10 +1,2 @@
 Effective nuclear repulsion energy (a.u.) 33.31
 Effective nuclear repulsion energy (a.u.) 33.31
-Effective nuclear repulsion energy (a.u.) 33.34
-Effective nuclear repulsion energy (a.u.) 33.60
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.65
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.66
-Effective nuclear repulsion energy (a.u.) 33.66

Failed
 Running tests/pspw_blyp_h2o/pspw_blyp_h2o 

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... failed
@@@     Comparison of Output Files
@@ -1,4 +1,4 @@
 Effective nuclear repulsion energy (a.u.) 9.08
-Total PSPW energy : -16.52427
+Total PSPW energy : -8.36881
 Total PSPW energy : -17.09442
 Total PSPW energy : -17.11908

Failed 
 Running tests/pspw_pbesol_h2o/pspw_pbesol_h2o

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... failed
@@@     Comparison of Output Files
@@ -1,10 +1,10 @@
 Effective nuclear repulsion energy (a.u.) 9.08
-Total PSPW energy : -16.55097
-Total PSPW energy : -17.10629
+Total PSPW energy : -8.53817
+Total PSPW energy : -8.53992
 Total PSPW energy : -17.11942
 Total PSPW energy : -17.13072
 Effective nuclear repulsion energy (a.u.) 9.08
 Total PSPW energy : -17.13072
-Total PSPW energy : -17.13106
+Total PSPW energy : -17.13105
 Effective nuclear repulsion energy (a.u.) 9.12
 Total PSPW energy : -17.13272

Failed
 Running tests/pspw_pbesol_h2o/pspw_pbesol_h2o

     cleaning scratch
     copying input and verified output files
     running nwchem (/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/../bin/LINUX64/nwchem)  with 1 processors

     verifying output ... failed
@@@     Comparison of Output Files
@@ -1,10 +1,10 @@
 Effective nuclear repulsion energy (a.u.) 9.08
-Total PSPW energy : -16.55097
-Total PSPW energy : -17.10629
+Total PSPW energy : -8.53817
+Total PSPW energy : -8.53992
 Total PSPW energy : -17.11942
 Total PSPW energy : -17.13072
 Effective nuclear repulsion energy (a.u.) 9.08
 Total PSPW energy : -17.13072
-Total PSPW energy : -17.13106
+Total PSPW energy : -17.13105
 Effective nuclear repulsion energy (a.u.) 9.12
 Total PSPW energy : -17.13272

Failed

Describe settings used USE_LIBXC=Y USE_MPI=Y PYTHONVERSION=3.11 NWCHEM_MODULES="all python" F77="gfortran13" F90="gfortran13" FC="gfortran13" FFLAGS="-O -Wl,-rpath=/usr/local/lib/gcc13" F90FLAGS="-O -Wl,-rpath=/usr/local/lib/gcc13" FCFLAGS="-Wl,-rpath=/usr/local/lib/gcc13" PERL_USE_UNSAFE_INC=1 XDG_DATA_HOME=/usr/ports/science/nwchem/work XDG_CONFIG_HOME=/usr/ports/science/nwchem/work XDG_CACHE_HOME=/usr/ports/science/nwchem/work/.cache HOME=/usr/ports/science/nwchem/work PATH=/usr/local/libexec/ccache:/usr/ports/science/nwchem/work/.bin:/home/yuri/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin PKG_CONFIG_LIBDIR=/usr/ports/science/nwchem/work/.pkgconfig:/usr/local/libdata/pkgconfig:/usr/local/share/pkgconfig:/usr/libdata/pkgconfig MK_DEBUG_FILES=no MK_KERNEL_SYMBOLS=no SHELL=/bin/sh NO_LINT=YES ADDR2LINE="/usr/local/bin/addr2line" AR="/usr/local/bin/ar" AS="/usr/local/bin/as" CPPFILT="/usr/local/bin/c++filt" GPROF="/usr/local/bin/gprof" LD="/usr/local/bin/ld" NM="/usr/local/bin/nm" OBJCOPY="/usr/local/bin/objcopy" OBJDUMP="/usr/local/bin/objdump" RANLIB="/usr/local/bin/ranlib" READELF="/usr/local/bin/readelf" SIZE="/usr/local/bin/size" STRINGS="/usr/local/bin/strings" PREFIX=/usr/local LOCALBASE=/usr/local CC="cc" CFLAGS="-O2 -pipe -fstack-protector-strong -fno-strict-aliasing " CPP="cpp" CPPFLAGS="" LDFLAGS=" -Wl,-rpath=/usr/local/lib/gcc13 -L/usr/local/lib/gcc13 -fstack-protector-strong " LIBS="" CXX="c++" CXXFLAGS="-O2 -pipe -fstack-protector-strong -fno-strict-aliasing " CCACHE_DIR="/tmp/.ccache" BSD_INSTALL_PROGRAM="install -s -m 555" BSD_INSTALL_LIB="install -s -m 0644" BSD_INSTALL_SCRIPT="install -m 555" BSD_INSTALL_DATA="install -m 0644" BSD_INSTALL_MAN="install -m 444"

Attach log files Attach as many log files as possible.

To Reproduce Run tests.

clang-18 OS: FreeBSD 14.1

edoapra commented 2 months ago

Your linear algebra settings are likely to be the culprit. Did you set BLAS_SIZE in a consistent way between GlobalArrays and NWChem? I don't see BLAS_SIZE mentioned in this issue. Please post the autoconf options use of configure Global Arrays, too.

edoapra commented 2 months ago

What's the URL associated with nwchemgit-nwchem-v7.2.3-release_GH0.tar.gz on the https://github.com/nwchemgit/nwchem release page?

yurivict commented 2 months ago

BLAS_SIZE is equal to 4 in both cases.

ga is configured with BLAS_SIZE=4 through configure arguments:

--enable-peigs --enable-shared --disable-static --with-scalapack --with-blas4 --prefix=/usr/local ${_LATE_CONFIGURE_ARGS}

nwchem is also configured with BLAS_SIZE=4 through make arguments:

NWCHEM_TOP=/usr/ports/science/nwchem/work/nwchem-7.2.3-release/src/.. NWCHEM_MODULES=all NWCHEM_LONG_PATHS=Y NWCHEM_TARGET=LINUX64 USE_INTERNALBLAS=Y EXTERNAL_GA_PATH=/usr/local USE_64TO32=y BLAS_SIZE=4 DESTDIR=/usr/ports/science/nwchem/work/stage

The 64_to_32 target wasn't run because the GitHub release tarball already has this done.

yurivict commented 2 months ago

The taball URL is: https://codeload.github.com/nwchemgit/nwchem/tar.gz/v7.2.3-release?dummy=/nwchemgit-nwchem-v7.2.3-release_GH0.tar.gz

edoapra commented 2 months ago

The taball URL is: https://codeload.github.com/nwchemgit/nwchem/tar.gz/v7.2.3-release?dummy=/nwchemgit-nwchem-v7.2.3-release_GH0.tar.gz

This is an automatically generated tarball that does NOT have gone through the make 64_to_32 step. This one is the one you need to fetch

https://github.com/nwchemgit/nwchem/releases/download/v7.2.3-release/nwchem-7.2.3-release.revision-d690e065-src.2024-08-27.tar.bz2

yurivict commented 2 months ago

I see. I will change this.

Does it in general make more sense to use BLAS_SIZE=8 on amd64 systems?

edoapra commented 2 months ago

I see. I will change this.

Does it in general make more sense to use BLAS_SIZE=8 on amd64 systems?

It makes sense on all 64-bit architectures so that you can skip the make 64_to_32 step