materialsproject / custodian

A simple, robust and flexible just-in-time job management framework in Python.
MIT License
135 stars 102 forks source link

[Bug]: Don't kill a VASP job if NBANDS is very high with no other warning #344

Open Andrew-S-Rosen opened 1 month ago

Andrew-S-Rosen commented 1 month ago

Code snippet

No response

What happened?

In the following block of code, if VASP updates the number of bands automatically and it's > 2x the number of electrons, Custodian will kill the job even if VASP completes with no errors.

https://github.com/materialsproject/custodian/blob/880ca8b96ec3adfe57497b4cba1b72242db8be83/src/custodian/vasp/handlers.py#L727-L736

In practice, having such a large number of bands is a sign of a problem by the user and can lead to spurious energetic states. I have observed this for gas-phase CO in a box. But we ultimately should not kill the job. We should just warn the user (even though it's unlikely they'll read the logs).

This was indirectly noted by @esoteric-ephemera in https://github.com/materialsproject/custodian/pull/342.

I can patch this, but it admittedly won't be for a little while. We should rethink how the bands stuff is being handled because in practice, it doesn't actually seem ideal...

Version

v2024.6.24

Which OS?

Log output

No response

Andrew-S-Rosen commented 1 month ago

My proposal is as follows.

  1. Remove "auto_nbands" from the error_msgs list. It is not an error but rather just a useful warning from VASP. Keeping it here could cause some mistakes later on by developers.

  2. Instead of relying on the if "auto_nbands" check, have the following block of checks be done in a "bespoke" manner where it is not tied to a specific VASP warning or error message:

https://github.com/materialsproject/custodian/blob/880ca8b96ec3adfe57497b4cba1b72242db8be83/src/custodian/vasp/handlers.py#L727-L751

  1. For the case of the UserWarning "NBANDS seems to be too high...", don't kill the job. Just warn the user.

This probably means we should be treating this as a new handler. Just like how we have an UnconvergedErrorHandler that checks for convergence, we can have an NbandsErrorHandler that checks for the appropriateness of nbands independent of any specific VASP error message.