materialsproject / custodian

A simple, robust and flexible just-in-time job management framework in Python.
MIT License
136 stars 104 forks source link

[VASP] Add handler for PSMAXN warnings and associated failures #133

Open rkingsbury opened 4 years ago

rkingsbury commented 4 years ago

Summary

Details

With certain combinations of ENCUT, LREAL, and pseudopotentials, VASP issues the warning

WARNING: PSMAXN for non-local potential too small

In some cases vasp still runs successfully, in other cases it will fail, e.g.

WARNING: PSMAXN for non-local potential too small
 LDA part: xc-table for Pade appr. of Perdew
 POSCAR, INCAR and KPOINTS ok, starting setup
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
...

It appears that Custodian does not recognize the above type of failure as an error. As a result, _run_job() will attempt to validate the output of the calculation (which never ran in the first place and therefore never generated output) and raise a ValidationError .

Traceback (most recent call last):
  File "/global/u2/r/rsking84/.conda/envs/cms/code/fireworks/fireworks/core/rocket.py", line 262, in run
    m_action = t.run_task(my_spec)
  File "/global/u2/r/rsking84/.conda/envs/cms/code/atomate/atomate/vasp/firetasks/run_calc.py", line 211, in run_task
    c.run()
  File "/global/u2/r/rsking84/.conda/envs/cms/code/custodian/custodian/custodian.py", line 378, in run
    self._run_job(job_n, job)
  File "/global/u2/r/rsking84/.conda/envs/cms/code/custodian/custodian/custodian.py", line 502, in _run_job
    raise ValidationError(s, True, v)
custodian.custodian.ValidationError: Validation failed: VasprunXMLValidator

The ValidationError is very difficult to troubleshoot without running vasp manually. In this situation, the contents of vasp.out are empty and std_err.txt contains only

srun: fatal: Can not execute vasp_std

Suggested solution

Information about the PSMAXN warning and associated failures is scarce, but there appear to be several possible fixes:

  1. Set LREAL=FALSE (expand the basis set in reciprocal space instead of real space)
  2. Sort the pseudopotentials such that the one with the highest ENMAX appears first in the POTCAR
  3. Lower the ENCUT value

I have had the most success with Option 1.

Option 2 has not solved the issue for me and is only applicable if the user does not specify ENCUT in the INCAR file (I think; see docs ). It is also not clear whether this fix is still relevant to the latest versions of VASP.

I'm not yet familiar enough with the architecture of Custodian to know the best way to address this, but it seems to me that, at a minimum, an error handler to catch this type of failure would be valuable. Even better would be to modify LREAL to FALSE on the fly.

Further reading on troubleshooting the VASP PSMAXN warnings:

https://cms.mpi.univie.ac.at/vasp-forum/viewtopic.php?f=3&t=8370

the reason most probably is that you join 2 potentials with very different cutoff, with the POTCAR with the SMALL cutoff (U) being the first in the list. This potentials is used to determine PSMAXN.
please
1) switch the 2 atoms in POSCAR and POTCAR (ie give the atoms such that those with the hardest potentials are first
2) OR use O_s (soft O, low cutoff)

https://cms.mpi.univie.ac.at/vasp-forum/viewtopic.php?t=14811

The warning means that PSMAXN is too small for the required cutoff energy (ENMAX) the first of the atoms given in POTCAR. Either use a harder potential or decrease ENMAX.

Solved it by setting LREAL=FALSE

https://www.researchgate.net/post/Relaxation_in_metal_using_vasp2

"PSMAXN for non-local potential too small" Try lowering your ENCUT parameter (how large is it, and what are the defaults in your POTCAR?), this error indicates that you go out of bounds for an array related to the potential, which is related to the cutoff energy.

http://materials.duke.edu/AFLOW/README_AFLOW.TXT)

PSMAXN PSMAXN errors. By default aflow tries to go around PSMAXN warnings by restarting VASP with reducingly lower ENMAX until everything is set. This can be done by tuning the INCAR schemes.

mkhorton commented 4 years ago

The order of the POTCARs matters? That's crazy. Is this true even though we set e.g. ENCUT manually?

rkingsbury commented 4 years ago

After further reading, I think those forum posts are out of date. According to the current docs:

rkingsbury commented 4 years ago

As an additional note, @mkhorton and I noticed that when this failure occurs, somehow the output from stdout is not written to vasp.out or std_err.txt. Full terminal output for an example failing calculation is:


OOO  PPPP  EEEEE N   N M   M PPPP

O O P P E NN N MM MM P P O O PPPP EEEEE N N N M M M PPPP -- VERSION O O P E N NN M M P OOO P EEEEE N N M M P

running 16 mpi-ranks, with 4 threads/rank distrk: each k-point on 16 cores, 1 groups distr: one band on 1 cores, 16 groups using from now: INCAR vasp.6.0.8 29Jun18 (build Jun 13 2019 12:54:44) complex

POSCAR found type information on POSCAR O Ba Be Si POSCAR found : 4 types and 7 ions scaLAPACK will be used

 -----------------------------------------------------------------------------
|                                                                             |
|  ADVICE TO THIS USER RUNNING 'VASP/VAMP'   (HEAR YOUR MASTER'S VOICE ...):  |
|                                                                             |
|      You have a (more or less) 'small supercell' and for smaller cells      |
|      it is recommended  to use the reciprocal-space projection scheme!      |
|      The real space optimization is not  efficient for small cells and it   |
|      is also less accurate ...                                              |
|      Therefore set LREAL=.FALSE. in the  INCAR file                         |
|                                                                             |
 -----------------------------------------------------------------------------

 WARNING: PSMAXN for non-local potential too small
 LDA part: xc-table for Pade appr. of Perdew
 found WAVECAR, reading the header
 POSCAR, INCAR and KPOINTS ok, starting setup
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
 REAL_OPT: internal ERROR:         -32         -32         -32           0
 VASP aborting ...
mkhorton commented 4 years ago

Yes, vasp.out when running from the workflow was empty even though the standard out was present interactively, very strange.

rkingsbury commented 4 years ago

After further testing, it seems that LREAL=False is not always a reliable fix for this. The other challenge is that often times the calculation can complete succesfully with a PSMAXN warning, so having a Custodian handler for it is problematic - we don't want to keep restarting the calculation just because that warning is present, but only when the calculation fails. I've updated the Handler to respond only to the REALOPT error, not the PSMAXN warning.

The real question is - how can we make sure that vasp.out gets populated correctly when these failures occur?

mkhorton commented 4 years ago

I’d open a separate issue for the vasp.out issue, I don’t think it’s specific to this particular warning.

On Tue, Nov 19, 2019 at 08:58, Ryan Kingsbury notifications@github.com wrote:

After further testing, it seems that LREAL=False is not always a reliable fix for this. The other challenge is that often times the calculation can complete succesfully with a PSMAXN warning, so having a Custodian handler for it is problematic - we don't want to keep restarting the calculation just because that warning is present, but only when the calculation fails.

The real question is - how can we make sure that vasp.out gets populated correctly when these failures occur?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/materialsproject/custodian/issues/133?email_source=notifications&email_token=AAWWWRGWXQMD5E33ZBRMUOLQUQLLVA5CNFSM4JNQZLB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEO5PSA#issuecomment-555603912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWWWRDYFM7TGO7HKUZALK3QUQLLVANCNFSM4JNQZLBQ .

shyuep commented 4 years ago

I would say we want to know what is a reliable way to resolve the problem first. If there is a reliable way, then fixing even runs that theoretically could finish is fine. You can also set a counter, e.g., if you have tried to fix PSMAXN a few times, the job will be flagged as unrecoverable.