mmorale3 / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids.
http://www.qmcpack.org
Other
0 stars 0 forks source link

Multi-node / multi core tests + python testing. #7

Open fdmalone opened 4 years ago

fdmalone commented 4 years ago

This might be easier to setup in house given the lack of access we have and that the CI machines only run on single nodes.

ncores > 1 cpu is suspicious.

mmorale3 commented 4 years ago

This is very easy to break. Without testing we are flying blind.

fdmalone commented 4 years ago

Regarding this, it seems LLNL allows for CI if the repo is mirrored to gitlab. Then we can trigger LC builds with a comment (like they do at oakridge). Figuring this out seems like potentially the most consistent option as we could add multi-node tests to the main qmcpack repo which would live forever. These would only have to be triggered for our PRs I imagine.

Alternatively I can cook something up which we can run manually ourselves. Possibly through a cron job.

mmorale3 commented 4 years ago

We should at least start with something in house at LLNL. We can modify your build scripts to run the unit tests with 2 nodes.

fdmalone commented 4 years ago

Isn't the distribution over cores/nodes controlled by the input file?

mmorale3 commented 4 years ago

Some unit tests are setup to run with nnodes>1 if they are run with more than 1 node. Look at the unit test wfn_fac_distributed for example. There is also another one in Propagator. Passing these unit tests catch most of the issues. We can extend these tests or setup longer runs later on. Getting these tested regularly would be a big first step.

fdmalone commented 4 years ago

Ok. I need to add a cmake function and split some of the files.

mmorale3 commented 4 years ago

Actually, all the unit tests will run with multiple nodes. They'll just repeat tests serially if they are not distributed tests. You'll just get a lot of repeated chatter.

fdmalone commented 4 years ago

I've added multi-node/core testing script in /usr/gapps/afqmc/codes/testing, with benchmark data in /usr/workspace/afqmc/testing. I'll set these up to submit on lassen/quartz maybe twice a week or nightly. Currently complex double (kpoint) appears to be failing on lassen.

fdmalone commented 4 years ago

They will also track the timing of the runs.