qsimulate-open / bagel

Brilliantly Advanced General Electronic-structure Library
GNU General Public License v3.0
92 stars 44 forks source link

Segfault while running testsuite #196

Closed mscho527 closed 4 years ago

mscho527 commented 4 years ago

I am running into errors in the TestSuite while preparing BAGEL for our computing cluster. It is a CentOS 7 machine, and I have tried both the latest source code on your Git repository and the latest issue (1.2.2). Each of boost/[1.68.0, 1.57, 1.62.0] and mvapich/2.3a was used to build BAGEL and I have encountered issues on the same step.

../../src/testimpl/test_scf.cc(86): Leaving test case "DF_HF"; testing time: 38210ms ../../src/testimpl/test_scf.cc(84): Leaving test suite "TEST_SCF"; testing time: 38210ms ../../src/testimpl/test_molden.cc(96): Entering test suite "TEST_MOLDEN" ../../src/testimpl/test_molden.cc(98): Entering test case "MOLDEN" ../../src/testimpl/test_molden.cc(99): info: check compare(molden_out_energy("hf_write_mol_sph", "hf_read_mol_sph"), -99.84772354 ) has passed

This is the last lines from running the TestSuite, before it exits due to time limit (4 hours). I could see that the progress hangs after the last line (test_molden.cc 99).

If I comment out the tests from test_scf.cc and test_molden.cc, I get the following: ./../src/testimpl/test_prop.cc(67): Entering test suite "TEST_PROP" ../../src/testimpl/test_prop.cc(69): Entering test case "MULTIPOLE" ../../src/testimpl/test_prop.cc(70): info: check compare<std::vector<double>>(multipole("hf_svp_dfhf"), hf_svp_dfhf_multipole_ref(), 1.0e-6) has passed ../../src/testimpl/test_prop.cc(69): Leaving test case "MULTIPOLE"; testing time: 2240ms ../../src/testimpl/test_prop.cc(67): Leaving test suite "TEST_PROP"; testing time: 2240ms ../../src/testimpl/test_rel.cc(66): Entering test suite "TEST_REL" ../../src/testimpl/test_rel.cc(68): Entering test case "DIRAC_FOCK" TestSuite: ../../../src/df/dfdistt.cc:60: bagel::DFDistT::DFDistT(std::shared_ptr<const bagel::ParallelDF>, std::shared_ptr<const bagel::StaticDist>): Assertion source->asize()*dist_->size(i) == adist->size(i)*bsize_' failed.

After this, Segfault happens in each MPI node.

I have tried to pinpoint which part of the code or dependencies is causing the issue, but without any luck. Could anyone suggest what I could do to either resolve the errors or diagnose where the error is coming from?

Thank you.