nwchemgit / nwchem

NWChem: Open Source High-Performance Computational Chemistry
http://nwchemgit.github.io
Other
484 stars 161 forks source link

pyqa3: Segmentation fault on Python 3.12 #892

Closed marcindulak closed 8 months ago

marcindulak commented 9 months ago

Describe the bug

The pyqa3 test results in "Segmentation fault".

Describe settings used

Encountered on several platforms supported by fedora (39) x86_64, aarch64, ppc64le https://koji.fedoraproject.org/koji/taskinfo?taskID=107865604 in both openmpi and mpich builds. The issue does not appear on centos9.

#7 [3/6] RUN set -x     && uname -a
#7 0.516 Linux buildkitsandbox 5.15.0-84-generic #93~20.04.1-Ubuntu SMP Wed Sep 6 16:15:40 UTC 2023 x86_64 GNU/Linux

#6 [4/6] RUN set -x     && cat /etc/*release | grep PRETTY
#6 0.608 PRETTY_NAME="Fedora Linux 39 (Container Image Prerelease)"

Attach log files

...
#9 [6/6] RUN cd /tmp && . /etc/profile.d/modules.sh&& module use /usr/share/modulefiles&& module load mpi/openmpi && mpiexec --allow-run-as-root -np 1 nwchem_openmpi pyqa3.nw
#9 sha256:883bc7b6291241d1dd7e25b54fb2839546f80a98fb4d60d3c6c85850584e4ebc
#9 1.251 buildkitsandbox:rank0.nwchem_binary_openmpi: Failed to get eth0 (unit 0) cpu set
#9 1.251 buildkitsandbox:rank0: PSM3 can't open nic unit: 0 (err=23)
#9 1.252 buildkitsandbox:rank0.nwchem_binary_openmpi: Failed to get eth0 (unit 0) cpu set
#9 1.252 buildkitsandbox:rank0: PSM3 can't open nic unit: 0 (err=23)
#9 1.253 buildkitsandbox:rank0.nwchem_binary_openmpi: Failed to get eth0 (unit 0) cpu set
#9 1.253 buildkitsandbox:rank0: PSM3 can't open nic unit: 0 (err=23)
#9 1.285 buildkitsandbox:rank0: PSM3 can't open nic unit: 0 (err=23)
#9 1.285 buildkitsandbox:rank0.nwchem_binary_openmpi: Failed to get eth0 (unit 0) cpu set
#9 1.286 --------------------------------------------------------------------------
#9 1.286 Open MPI failed an OFI Libfabric library call (fi_endpoint).  This is highly
#9 1.286 unusual; your job may behave unpredictably (and/or abort) after this.
#9 1.286 
#9 1.286   Local host: buildkitsandbox
#9 1.286   Location: mtl_ofi_component.c:509
#9 1.286   Error: Invalid argument (22)
#9 1.286 --------------------------------------------------------------------------
#9 1.289  argument  1 = pyqa3.nw
#9 1.327 
#9 1.327 
#9 1.327 
#9 1.327 ============================== echo of input deck ==============================
#9 1.327 echo
#9 1.327 start testpy3
#9 1.327 # test some basic python wrappers.
#9 1.327 # if it did not abort, it worked.
#9 1.327 print none
#9 1.327 
#9 1.327 driver
#9 1.327 clear
#9 1.327 end
#9 1.327 basis
#9 1.327   h library 3-21g
#9 1.327 end
#9 1.327 
#9 1.327 python
#9 1.327 print ("value check:")
#9 1.327 print ("INT     = ", INT)
#9 1.327 print ("DBL     = ", DBL)
#9 1.327 print ("CHAR    = ", CHAR)
#9 1.327 print ("LOGICAL = ", LOGICAL)
#9 1.327 
#9 1.327 rtdb_put("test_int2", 22)
#9 1.327 print (' Done 1')
#9 1.327 rtdb_put("test_int", [22, 10, 3],    INT)
#9 1.327 print (' Done 2')
#9 1.327 rtdb_put("test_dbl", [22.9, 12.4, 23.908],  DBL)
#9 1.327 print (' Done 3')
#9 1.327 rtdb_put("test_str", "hello", CHAR)
#9 1.327 print (' Done 4')
#9 1.327 rtdb_put("test_logic", [0,1,0,1,0,1], LOGICAL)
#9 1.327 print (' Done 5')
#9 1.327 rtdb_put("test_logic2", 0, LOGICAL)
#9 1.328 print (' Done 6')
#9 1.328 
#9 1.328 rtdb_print(1)
#9 1.328 
#9 1.328 print ("test_str    = "), rtdb_get("test_str")
#9 1.328 print ("test_int    = "), rtdb_get("test_int")
#9 1.328 print ("test_in2    = "), rtdb_get("test_int2")
#9 1.328 print ("test_dbl    = "), rtdb_get("test_dbl")
#9 1.328 print ("test_logic  = "), rtdb_get("test_logic")
#9 1.328 print ("test_logic2 = "), rtdb_get("test_logic2")
#9 1.328 
#9 1.328 def energy(r):
#9 1.328   input_parse('''
#9 1.328     geometry noprint noautoz
#9 1.328       h 0 0 0
#9 1.328       h 0 0 %f
#9 1.328    end
#9 1.328   ''' % r)
#9 1.328   return task_energy('scf')
#9 1.328 
#9 1.328 for r in (0.4, 0.5, 0.6):
#9 1.328   print (r, energy(r))
#9 1.328 
#9 1.328 print (task_optimize('scf'))
#9 1.328 
#9 1.328 end
#9 1.328 
#9 1.328 task python
#9 1.328 ================================================================================
#9 1.328 
#9 1.328 
#9 1.328                                          
#9 1.328                                          
#9 1.328 
#9 1.328 
#9 1.328              Northwest Computational Chemistry Package (NWChem) 7.2.0
#9 1.328              --------------------------------------------------------
#9 1.328 
#9 1.328 
#9 1.328                     Environmental Molecular Sciences Laboratory
#9 1.328                        Pacific Northwest National Laboratory
#9 1.328                                 Richland, WA 99352
#9 1.328 
#9 1.328                               Copyright (c) 1994-2022
#9 1.328                        Pacific Northwest National Laboratory
#9 1.328                             Battelle Memorial Institute
#9 1.328 
#9 1.328              NWChem is an open-source computational chemistry package
#9 1.328                         distributed under the terms of the
#9 1.328                       Educational Community License (ECL) 2.0
#9 1.328              A copy of the license is included with this distribution
#9 1.328                               in the LICENSE.TXT file
#9 1.328 
#9 1.328                                   ACKNOWLEDGMENT
#9 1.328                                   --------------
#9 1.328 
#9 1.328             This software and its documentation were developed at the
#9 1.328             EMSL at Pacific Northwest National Laboratory, a multiprogram
#9 1.328             national laboratory, operated for the U.S. Department of Energy
#9 1.328             by Battelle under Contract Number DE-AC05-76RL01830. Support
#9 1.328             for this work was provided by the Department of Energy Office
#9 1.328             of Biological and Environmental Research, Office of Basic
#9 1.328             Energy Sciences, and the Office of Advanced Scientific Computing.
#9 1.328 
#9 1.328 
#9 1.328            Job information
#9 1.328            ---------------
#9 1.328 
#9 1.328     hostname        = buildkitsandbox
#9 1.328     program         = nwchem_binary_openmpi
#9 1.328     date            = Sat Oct 21 16:53:25 2023
#9 1.328 
#9 1.328     compiled        = Sat_Oct_21_11:04:33_2023
#9 1.328     source          = /builddir/build/BUILD/nwchem-487f8b945fbe9cedf02757dacddc66d40dc74ed9
#9 1.328     nwchem branch   = 7.2.0
#9 1.328     nwchem revision = N/A
#9 1.328     ga revision     = 5.8.0
#9 1.328     use scalapack   = T
#9 1.328     input           = pyqa3.nw
#9 1.328     prefix          = testpy3.
#9 1.328     data base       = ./testpy3.db
#9 1.328     status          = startup
#9 1.328     nproc           =        1
#9 1.329     time left       =     -1s
#9 1.329 
#9 1.329 
#9 1.329 
#9 1.329            Memory information
#9 1.329            ------------------
#9 1.329 
#9 1.329     heap     =   26214400 doubles =    200.0 Mbytes
#9 1.329     stack    =   26214397 doubles =    200.0 Mbytes
#9 1.329     global   =   52428800 doubles =    400.0 Mbytes (distinct from heap & stack)
#9 1.329     total    =  104857597 doubles =    800.0 Mbytes
#9 1.329     verify   = yes
#9 1.329     hardfail = no 
#9 1.329 
#9 1.329 
#9 1.329            Directory information
#9 1.329            ---------------------
#9 1.329 
#9 1.329   0 permanent = .
#9 1.329   0 scratch   = .
#9 1.329 
#9 1.329 
#9 1.329 
#9 1.329 
#9 1.329                                 NWChem Input Module
#9 1.329                                 -------------------
#9 1.329 
#9 1.329 
#9 1.329   library name resolved from: .nwchemrc
#9 1.329   library file name is: </usr/share/nwchem/libraries/>
#9 1.329   
#9 1.329 mpiexec: Forwarding signal 23 to job
#9 1.333                       Basis "ao basis" -> "" (cartesian)
#9 1.333                       -----
#9 1.333   h (Hydrogen)
#9 1.333   ------------
#9 1.333             Exponent  Coefficients 
#9 1.333        -------------- ---------------------------------------------------------
#9 1.333   1 S  5.44717800E+00  0.156285
#9 1.333   1 S  8.24547000E-01  0.904691
#9 1.333 
#9 1.333   2 S  1.83192000E-01  1.000000
#9 1.333 
#9 1.333 
#9 1.333 
#9 1.333  Summary of "ao basis" -> "" (cartesian)
#9 1.333  ------------------------------------------------------------------------------
#9 1.333        Tag                 Description            Shells   Functions and Types
#9 1.333  ---------------- ------------------------------  ------  ---------------------
#9 1.333  h                           3-21g                   2        2   2s
#9 1.333 
#9 1.333 
#9 1.335 
#9 1.335                                NWChem Python program
#9 1.335                                ---------------------
#9 1.335 
#9 1.335 print ("value check:")
#9 1.335 print ("INT     = ", INT)
#9 1.335 print ("DBL     = ", DBL)
#9 1.335 print ("CHAR    = ", CHAR)
#9 1.335 print ("LOGICAL = ", LOGICAL)
#9 1.335 
#9 1.335 rtdb_put("test_int2", 22)
#9 1.335 print (' Done 1')
#9 1.335 rtdb_put("test_int", [22, 10, 3],    INT)
#9 1.335 print (' Done 2')
#9 1.335 rtdb_put("test_dbl", [22.9, 12.4, 23.908],  DBL)
#9 1.335 print (' Done 3')
#9 1.335 rtdb_put("test_str", "hello", CHAR)
#9 1.335 print (' Done 4')
#9 1.335 rtdb_put("test_logic", [0,1,0,1,0,1], LOGICAL)
#9 1.335 print (' Done 5')
#9 1.335 rtdb_put("test_logic2", 0, LOGICAL)
#9 1.335 print (' Done 6')
#9 1.335 
#9 1.335 rtdb_print(1)
#9 1.335 
#9 1.335 print ("test_str    = "), rtdb_get("test_str")
#9 1.335 print ("test_int    = "), rtdb_get("test_int")
#9 1.335 print ("test_in2    = "), rtdb_get("test_int2")
#9 1.335 print ("test_dbl    = "), rtdb_get("test_dbl")
#9 1.335 print ("test_logic  = "), rtdb_get("test_logic")
#9 1.335 print ("test_logic2 = "), rtdb_get("test_logic2")
#9 1.335 
#9 1.335 def energy(r):
#9 1.335   input_parse('''
#9 1.335     geometry noprint noautoz
#9 1.335       h 0 0 0
#9 1.335       h 0 0 %f
#9 1.335    end
#9 1.335   ''' % r)
#9 1.335   return task_energy('scf')
#9 1.335 
#9 1.335 for r in (0.4, 0.5, 0.6):
#9 1.335   print (r, energy(r))
#9 1.335 
#9 1.335 print (task_optimize('scf'))
#9 1.335 
#9 1.335 
#9 1.349 value check:
#9 1.349 INT     =  1010
#9 1.349 DBL     =  1013
#9 1.349 CHAR    =  1000
#9 1.349 LOGICAL =  1011
#9 1.349  Done 1
#9 1.350  Done 2
#9 1.350  Done 3
#9 1.350  Done 4
#9 1.350  Done 5
#9 1.350  Done 6
#9 1.350 
#9 1.350  Contents of RTDB ./testpy3.db
#9 1.350  -----------------------------
#9 1.350 
#9 1.350  Entry                                   Type[nelem]           Date
#9 1.350  ---------------------------  ----------------------  ------------------------
#9 1.350  test_int2                                int[1]      Sat Oct 21 16:53:25 2023 
#9 1.350 22 
#9 1.350  basis:names                             char[9]      Sat Oct 21 16:53:25 2023 
#9 1.350 ao basis
#9 1.350  :print                                  char[5]      Sat Oct 21 16:53:25 2023 
#9 1.350 none
#9 1.350  task:theory                             char[7]      Sat Oct 21 16:53:25 2023 
#9 1.350 python
#9 1.350  test_dbl                              double[3]      Sat Oct 21 16:53:25 2023 
#9 1.350 2.29000000000000e+01 1.24000000000000e+01 2.39080000000000e+01 
#9 1.350  hello                                   char[7]      Sat Oct 21 16:53:25 2023 
#9 1.350 hello
#9 1.350 
#9 1.350  basis:ao basis:tags info                 int[280]    Sat Oct 21 16:53:25 2023 
#9 1.350 2 3 3 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
#9 1.350 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
...
#9 1.350 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
#9 1.350  test_int                                 int[3]      Sat Oct 21 16:53:25 2023 
#9 1.350 22 10 3 
#9 1.350  basis:ao basis:star nr tags              int[1]      Sat Oct 21 16:53:25 2023 
#9 1.350 0 
#9 1.350  basis:ao basis:contraction info            int[90000]  Sat Oct 21 16:53:25 2023 
#9 1.351 0 2 1 1 3 0 1 0 0 0 1 1 5 6 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
#9 1.351 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
...
#9 1.360 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
#9 1.360 mpiexec: Forwarding signal 23 to job
#9 1.389 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
..
#9 1.395 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
#9 1.395  basis:ao basis:bs_nr_tags                int[1]      Sat Oct 21 16:53:25 2023 
#9 1.395 1 
#9 1.395  test_logic                           logical[6]      Sat Oct 21 16:53:25 2023 
#9 1.395 f t f t f t 
#9 1.395  basis:ao basis:bs_stdname               char[6]      Sat Oct 21 16:53:25 2023 
#9 1.396 3-21g
#9 1.396  basis:ao basis:header                    int[7]      Sat Oct 21 16:53:25 2023 
#9 1.396 1 2 3 3 6 0 0 
#9 1.396  test_logic2                          logical[1]      Sat Oct 21 16:53:25 2023 
#9 1.396 f 
#9 1.396  basisprint:ao basis                  logical[1]      Sat Oct 21 16:53:25 2023 
#9 1.396 f 
#9 1.396  opt:driver                           logical[1]      Sat Oct 21 16:53:25 2023 
#9 1.396 t 
#9 1.396  basis:ao basis:number of exps and coeffs            int[1]      Sat Oct 21 16:53:25 2023 
#9 1.396 6 
#9 1.396  basis:ao basis:exps and coeffs         double[6]      Sat Oct 21 16:53:25 2023 
#9 1.396 5.44717800000000e+00 8.24547000000001e-01 1.56285000000000e-01 9.04691000000001e-01 
#9 1.396 1.83192000000000e-01 1.00000000000000e+00 
#9 1.396  basis:nbasis                             int[1]      Sat Oct 21 16:53:25 2023 
#9 1.396 1 
#9 1.396  file_prefix                             char[8]      Sat Oct 21 16:53:25 2023 
#9 1.396 testpy3
#9 1.396  basis:ao basis:assoc ecp name           char[4]      Sat Oct 21 16:53:25 2023 
#9 1.396  
#9 1.396  
#9 1.396  basis:ao basis:bs_tags                  char[2]      Sat Oct 21 16:53:25 2023 
#9 1.396 h
#9 1.396  rtdb:stored:state                    logical[3]      Sat Oct 21 16:53:25 2023 
#9 1.396 t f f 
#9 1.396 
#9 1.396 test_str    = 
#9 1.396 [buildkitsandbox:29   :0:29] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
#9 1.396 ==== backtrace (tid:     29) ====
#9 1.396  0  /lib64/libucs.so.0(ucs_handle_error+0x2ec) [0x7f00cc1f110c]
#9 1.396  1  /lib64/libucs.so.0(+0x2a7fd) [0x7f00cc1f27fd]
#9 1.396  2  /lib64/libucs.so.0(+0x2a9cd) [0x7f00cc1f29cd]
#9 1.396  3  /lib64/libpython3.12.so.1.0(PyObject_CallOneArg+0x38) [0x7f00e3d022b8]
#9 1.396  4  /lib64/libpython3.12.so.1.0(+0x21a1be) [0x7f00e3d021be]
#9 1.396  5  /lib64/libpython3.12.so.1.0(_PyErr_SetObject+0x14b) [0x7f00e3d01fab]
#9 1.396  6  /lib64/libpython3.12.so.1.0(PyErr_SetString+0x5a) [0x7f00e3d2c46a]
#9 1.396  7  nwchem_binary_openmpi() [0x182a4ec]
#9 1.396  8  /lib64/libpython3.12.so.1.0(+0x210918) [0x7f00e3cf8918]
#9 1.396  9  /lib64/libpython3.12.so.1.0(_PyObject_MakeTpCall+0x76) [0x7f00e3cdcd16]
#9 1.396 10  /lib64/libpython3.12.so.1.0(+0x10f5f0) [0x7f00e3bf75f0]
#9 1.396 11  /lib64/libpython3.12.so.1.0(PyEval_EvalCode+0xb6) [0x7f00e3d6f8d6]
#9 1.396 12  /lib64/libpython3.12.so.1.0(+0x2aad9a) [0x7f00e3d92d9a]
#9 1.396 13  /lib64/libpython3.12.so.1.0(+0x2a5ece) [0x7f00e3d8dece]
#9 1.396 14  /lib64/libpython3.12.so.1.0(+0x2c64d3) [0x7f00e3dae4d3]
#9 1.396 15  /lib64/libpython3.12.so.1.0(_PyRun_SimpleFileObject+0x1ca) [0x7f00e3dad9da]
#9 1.396 16  /lib64/libpython3.12.so.1.0(PyRun_SimpleFileExFlags+0x41) [0x7f00e3cc777d]
#9 1.396 17  nwchem_binary_openmpi() [0x1829f0f]
#9 1.396 18  nwchem_binary_openmpi() [0x416a5c]
#9 1.396 19  nwchem_binary_openmpi() [0x40e2eb]
#9 1.396 20  nwchem_binary_openmpi() [0x40e831]
#9 1.396 21  /lib64/libc.so.6(+0x2814a) [0x7f00da23b14a]
#9 1.396 22  /lib64/libc.so.6(__libc_start_main+0x8b) [0x7f00da23b20b]
#9 1.396 23  nwchem_binary_openmpi() [0x40c625]
#9 1.396 =================================
#9 1.397 
#9 1.397 Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
#9 1.397 
#9 1.397 Backtrace for this error:
#9 1.573 #0  0x7f00e48b6872 in ???
#9 1.573 #1  0x7f00e48b5a05 in ???
#9 1.573 #2  0x7f00da25199f in ???
#9 1.573 #3  0x7f00e3d022b8 in ???
#9 1.573 #4  0x7f00e3d021bd in ???
#9 1.573 #5  0x7f00e3d01faa in ???
#9 1.573 #6  0x7f00e3d2c469 in ???
#9 1.573 #7  0x182a4eb in ???
#9 1.573 #8  0x7f00e3cf8917 in ???
#9 1.573 #9  0x7f00e3cdcd15 in ???
#9 1.573 #10  0x7f00e3bf75ef in ???
#9 1.573 #11  0x7f00e3d6f8d5 in ???
#9 1.573 #12  0x7f00e3d92d99 in ???
#9 1.573 #13  0x7f00e3d8decd in ???
#9 1.573 #14  0x7f00e3dae4d2 in ???
#9 1.573 #15  0x7f00e3dad9d9 in ???
#9 1.573 #16  0x7f00e3cc777c in ???
#9 1.573 #17  0x1829f0e in ???
#9 1.573 #18  0x416a5b in ???
#9 1.573 #19  0x40e2ea in ???
#9 1.573 #20  0x40e830 in ???
#9 1.573 #21  0x7f00da23b149 in ???
#9 1.573 #22  0x7f00da23b20a in ???
#9 1.573 #23  0x40c624 in ???
#9 1.573 #24  0xffffffffffffffff in ???
#9 1.742 /usr/lib64/openmpi/bin/nwchem_openmpi: line 4:    29 Segmentation fault      (core dumped) nwchem_binary_openmpi "$@"
#9 1.743 --------------------------------------------------------------------------
#9 1.743 Primary job  terminated normally, but 1 process returned
#9 1.743 a non-zero exit code. Per user-direction, the job has been aborted.
#9 1.743 --------------------------------------------------------------------------
#9 3.744 --------------------------------------------------------------------------
#9 3.744 mpiexec detected that one or more processes exited with non-zero status, thus causing
#9 3.744 the job to be terminated. The first process to do so was:
#9 3.744 
#9 3.744   Process name: [[57486,1],0]
#9 3.744   Exit code:    139
#9 3.744 --------------------------------------------------------------------------

To Reproduce

To reproduce, build the following container image using docker build --progress=plain -t test:latest .

FROM fedora:39@sha256:95c88cea36312cfd73613f1c56d8a8db8e63be5cd884cfc97ba9475d0a45eac5

RUN set -x \
    && dnf install -y https://kojipkgs.fedoraproject.org//work/tasks/5675/107865675/nwchem-7.2.1-1.fc39.x86_64.rpm \
                      https://kojipkgs.fedoraproject.org//work/tasks/5675/107865675/nwchem-common-7.2.1-1.fc39.noarch.rpm \
                      https://kojipkgs.fedoraproject.org//work/tasks/5675/107865675/nwchem-openmpi-7.2.1-1.fc39.x86_64.rpm \
    && dnf clean all

RUN set -x \
    && uname -a

RUN set -x \
    && cat /etc/*release | grep PRETTY

RUN set -x \
    && curl -sL https://github.com/nwchemgit/nwchem/raw/master/QA/tests/pyqa3/pyqa3.nw -o /tmp/pyqa3.nw

RUN cd /tmp && . /etc/profile.d/modules.sh&& module use /usr/share/modulefiles&& module load mpi/openmpi && mpiexec --allow-run-as-root -np 1 nwchem_openmpi pyqa3.nw

CMD ["/bin/bash"]
edoapra commented 9 months ago

The following commit should fix this python 3.12 segfault https://github.com/nwchemgit/nwchem/commit/48fac057df694267c2422adc2b394a0ac0815c02

Fix applied to the hotfix/release-7-2-0 branch, too https://github.com/nwchemgit/nwchem/commit/48fac057df694267c2422adc2b394a0ac0815c02

See use of this patch in the NWChem conda-forge recipe https://github.com/conda-forge/nwchem-feedstock/commit/11d3bf8e8e0db1732a6120307a358cf527431bcf

edoapra commented 9 months ago

@marcindulak Have you noticed that the mpich build shows more failures than the openmpi one? https://kojipkgs.fedoraproject.org//packages/nwchem/7.2.1/1.fc40/data/logs/x86_64/build.log

marcindulak commented 9 months ago

I noticed some more errors, but didn't remember whether the past fedora builds had all tests passing, and assumed a few failed tests are ok. By checking some past builds it seems like 7.0.2 was rather passing all the tests, but already 7.2.0 had a few failing ones, mostly for mpich.

The fedora build only runs /doafewqmtests.mpi and ./dolibxctests.mpi. A failed test does not stop the build, so a visual inspection of the log is needed. Would you like I open an issue if any of these tests fail? Are you interested mostly in rawhide or the last fedora release? What about the archs: I build for x86_64, aarch64, ppc64le https://fedoraproject.org/wiki/Architectures - would you like to see the errors from all of them? I could include a Dockerfile that runs one of the failed test examples, as above, but only on x86_64. Are you interested more in a pre-release tests (more work) or post-release tests are enough?

edoapra commented 9 months ago

All the mpich failures are due to output lines missing. This could be due either by the QA results parser not working properly (for example when extra stuff comes from stder) or by other source of output truncation.
Is it possible to cat the full output files?

marcindulak commented 9 months ago

I made a separate issue about the mpich failures https://github.com/nwchemgit/nwchem/issues/895

marcindulak commented 8 months ago

Should we close the issue? 7.2.2 https://github.com/nwchemgit/nwchem/commit/74936fb92aec6990ce48bf334747215813684576 does not fail on pyqa3

edoapra commented 8 months ago

Should we close the issue? 7.2.2 74936fb does not fail on pyqa3

Sure.