mphowardlab / relentless

Computational materials design, with less code.
https://relentless.readthedocs.io
BSD 3-Clause "New" or "Revised" License
10 stars 1 forks source link

Document how to use mpi4py to avoid deadlock #221

Open mphoward opened 1 year ago

mphoward commented 1 year ago

I had a bunch of simulations get stuck, and I suspect the issue is with generating the initial particle configuration because the volume fraction was too high. _pack_particles should have thrown an error to kill the processes but it is only called by the root rank, so when it throws, it probably made a deadlock for the non-throwing ranks at a later barrier (that the root rank never reached).

The issue would probably also be present for HOOMD, but I haven't tested.

I think this is a known behavior of mpi4py

https://mpi4py.readthedocs.io/en/stable/mpi4py.run.html

We should document this behavior and solution.

mphoward commented 1 year ago

Or, we can install a different exception hook to force the abort if we are running under MPI:

https://github.com/glotzerlab/hoomd-blue/blob/41cd19d63a572e061f109bf2aedee38ee69a35f8/hoomd/__init__.py