materialsproject / custodian

A simple, robust and flexible just-in-time job management framework in Python.
MIT License
136 stars 105 forks source link

VASP fix: ZHEGV error occurs when small # of atoms and too many cores #192

Closed Andrew-S-Rosen closed 2 years ago

Andrew-S-Rosen commented 2 years ago

Currently, when the ZHEGV error appears in VASP, Custodian switches ALGO to All. This is the right approach, but for small systems (e.g. elementals with only 1 or 2 atoms/cell), oftentimes ZHEGV appears because too many cores are requested. With supercomputers having more and more cores per node these days, this comes up more frequently. The solution is just to decrease the number of cores for the job, but rather than trying to play around with that, Custodian could issue a warning to the user if len(structure) is below some cutoff. It would require some testing to figure out what a good value for this might be. My gut feeling is perhaps 5 atoms/cell.

Andrew-S-Rosen commented 2 years ago

After iterative testing, the easiest solution is to simply raise NPAR or NCORE so that the work isn't so spread out.