materialsproject / custodian

A simple, robust and flexible just-in-time job management framework in Python.
MIT License
136 stars 104 forks source link

Bug: Command line argument 'vasp_gam' was not understood. #265

Open ryotatomioka opened 1 year ago

ryotatomioka commented 1 year ago

When reporting bugs/issues, please supply the following information. If this is a feature request, please simply state the requested feature.

System

Summary

I believe a bug was introduced in this commit around these lines https://github.com/materialsproject/custodian/blob/f7dc11a15aee43e57cc8b52bf4a39ed7c5fd1e63/custodian/vasp/jobs.py#L689-L696

The problem is that both self.vasp_cmd and self.gamma_vasp_cmd can be lists! In this case, self.gamma_vasp_cmd is appended to self.vasp_cmd every time terminate method is called. This is the case when custodian is called from atomate2. See: https://github.com/materialsproject/atomate2/blob/02e44c038903d2c935c82b31afd8ab82a69c039e/src/atomate2/vasp/run.py#L86-L170

This results in

Command line argument 'vasp_gam' was not understood. in vasp.out

For some reason, custodian does not see this as an error and keep applying the same correction until the maximum number of corrections are used resulting in a confusing error message. It would be better if we can improve the error message as well.

Example code

from atomate2.vasp.jobs.core import RelaxMaker
from jobflow import run_locally
from pymatgen.core import Structure
Structure(
    lattice=[[0, 2.13, 2.13], [2.13, 0, 2.13], [2.13, 2.13, 0]],
    species=["Ba", "O"],
    coords=[[0, 0, 0], [0.5, 0.5, 0.5]]
)
relax_job = RelaxMaker().make(structure)
run_locally(relax_job, create_folders=True)

Error message

2023-06-19 05:56:50,913 INFO Started executing jobs locally
2023-06-19 05:56:50,917 INFO Starting job - relax (d1196769-4075-47b3-8a2f-6087a8c96db5)
ERROR:custodian.custodian:LargeSigmaHandler
WARNING:custodian.vasp.jobs:killing vasp processes in work dir /scratch/job_2023-06-19-05-56-50-916083-15798 failed. Resorting to 'killall'.
vasp_std: no process found
vasp_gam: no process found
ERROR:custodian.custodian:LargeSigmaHandler
WARNING:custodian.vasp.jobs:killing vasp processes in work dir /scratch/job_2023-06-19-05-56-50-916083-15798 failed. Resorting to 'killall'.
vasp_std: no process found
vasp_gam: no process found
vasp_gam: no process found
ERROR:custodian.custodian:LargeSigmaHandler
WARNING:custodian.vasp.jobs:killing vasp processes in work dir /scratch/job_2023-06-19-05-56-50-916083-15798 failed. Resorting to 'killall'.
vasp_std: no process found
vasp_gam: no process found
vasp_gam: no process found
vasp_gam: no process found
ERROR:custodian.custodian:LargeSigmaHandler
WARNING:custodian.vasp.jobs:killing vasp processes in work dir /scratch/job_2023-06-19-05-56-50-916083-15798 failed. Resorting to 'killall'.
vasp_std: no process found
vasp_gam: no process found
vasp_gam: no process found
vasp_gam: no process found
vasp_gam: no process found
ERROR:custodian.custodian:Unrecoverable error for handler: <custodian.vasp.handlers.LargeSigmaHandler object at 0x7f9f367fbfd0>
2023-06-19 06:02:35,068 INFO relax failed with exception:
Traceback (most recent call last):
  File "/opt/miniconda/lib/python3.9/site-packages/jobflow/managers/local.py", line 98, in _run_job
    response = job.run(store=store)
  File "/opt/miniconda/lib/python3.9/site-packages/jobflow/core/job.py", line 544, in run
    response = function(*self.function_args, **self.function_kwargs)
  File "/opt/miniconda/lib/python3.9/site-packages/atomate2/vasp/jobs/base.py", line 147, in make
    run_vasp(**self.run_vasp_kwargs)
  File "/opt/miniconda/lib/python3.9/site-packages/atomate2/vasp/run.py", line 167, in run_vasp
    c.run()
  File "/opt/miniconda/lib/python3.9/site-packages/custodian/custodian.py", line 383, in run
    self._run_job(job_n, job)
  File "/opt/miniconda/lib/python3.9/site-packages/custodian/custodian.py", line 521, in _run_job
    raise NonRecoverableError(s, True, x["handler"])
custodian.custodian.NonRecoverableError: Unrecoverable error for handler: <custodian.vasp.handlers.LargeSigmaHandler object at 0x7f9f367fbfd0>

INFO:jobflow.managers.local:relax failed with exception:
Traceback (most recent call last):
  File "/opt/miniconda/lib/python3.9/site-packages/jobflow/managers/local.py", line 98, in _run_job
    response = job.run(store=store)
  File "/opt/miniconda/lib/python3.9/site-packages/jobflow/core/job.py", line 544, in run
    response = function(*self.function_args, **self.function_kwargs)
  File "/opt/miniconda/lib/python3.9/site-packages/atomate2/vasp/jobs/base.py", line 147, in make
    run_vasp(**self.run_vasp_kwargs)
  File "/opt/miniconda/lib/python3.9/site-packages/atomate2/vasp/run.py", line 167, in run_vasp
    c.run()
  File "/opt/miniconda/lib/python3.9/site-packages/custodian/custodian.py", line 383, in run
    self._run_job(job_n, job)
  File "/opt/miniconda/lib/python3.9/site-packages/custodian/custodian.py", line 521, in _run_job
    raise NonRecoverableError(s, True, x["handler"])
custodian.custodian.NonRecoverableError: Unrecoverable error for handler: <custodian.vasp.handlers.LargeSigmaHandler object at 0x7f9f367fbfd0>

2023-06-19 06:02:35,069 INFO Finished executing jobs locally
INFO:jobflow.managers.local:Finished executing jobs locally
{}

Suggested solution (if known)

Files

<If input files are needed for the error, please copy and paste them here.>

<contents of file 1>
janosh commented 1 year ago

@ryotatomioka Thanks for the repro and great analysis!

+1 for making self.vasp_cmd and self.gamma_vasp_cmd immutable.

MichaelWolloch commented 1 year ago

Good catch @ryotatomioka , and sorry for messing this up. I tried to stick to the old terminate functionality, but messed up at least one indentation. Maybe more.

@janosh , I am happy to make the commands immutable in a PR, but probably this can be included in #264, since this also has to do with termination. What do you think?

janosh commented 1 year ago

@MichaelWolloch Yes, if @fyalcin would like to include a fix for this in #264, that'd be great!

Andrew-S-Rosen commented 1 year ago

@shyuep I think this was closed in #264.