qsimulate-open / bagel

Brilliantly Advanced General Electronic-structure Library
GNU General Public License v3.0
92 stars 44 forks source link

Running TestSuits in Parallel? #198

Closed yangliu2009 closed 2 years ago

yangliu2009 commented 4 years ago

The installation guid instructs to run './TestSuit --log_level=all' for a test run. The test run passes without any errors for my installation with Redhat 7, Slurm 17, boost/1.72, mvapich2.3.3 and gcc7.2. Does the test run on only one core? If yes, how to test the installation works with mpi? Should we run with "run --mpi=pmi2 ./Testsuits --log_level=all" to test that?

shiozaki commented 4 years ago

You can probably run

mpirun -n 2 ./TestSuite --log_level=all
yangliu2009 commented 4 years ago

Does the TestSuits works for 3 or more cores? My installation received segmentation fault for 3 or more cores from 'turn --mpi=pmi2 ./TestSuit --log_level=all", but fine with 2 cores.

I tested the BAGEL directly with 4 cores on test/hf_svp_coulomb.json, but received the following error: Warning Since the number of auxiliary shells is too small, we do not parallelize the Fock builder.

I am not sure the error was because of the slurm and mvapich integration, or that Testsuits is not intended to run in parallel on 3 or more cores.

shiozaki commented 4 years ago

That's most likely because those tests are too small to be parallelized with 4 processes. Can you try a similar input with a larger molecule? You could check the accuracy against serial runs.

(Yes, the BAGEL should throw reasonable exception... but unfortunately in some cases it doesn't...)

yangliu2009 commented 4 years ago

Our user tested BAGEL with 16 cores on two nodes and it worked well. So TestSuit probably does not work for 3 and more cores.

shiozaki commented 2 years ago

Let me close this for now. I'd say that the test system needs upgrade but it should be done on a separate thread.