Closed pmrv closed 7 months ago
@samwaseda It seems you made the original change to support recollecting when jobs timed out. I've added somewhat more general tools to contrib to do this for any job.
Totals | |
---|---|
Change from base Build 8507305606: | 0.05% |
Covered Lines: | 14239 |
Relevant Lines: | 15260 |
Shouldn't it at least say something when the collecting fails?
I'm looking at it. But also this fix should only be done if job.status.collect
, right? If it times out before that, likely the actual calculation will be aborted as well or am I missing something?
LGTM,
there's some weirdness going on in the code below with silent toggling of the magnetic flags on a restart... just a note for myself in the future
Once this is merged, we should also release a new pyiron_atomistics version.
Add test cases with "broken" files. Apparently @ahmedabdelkawy will provide the files.
@ahmedabdelkawy Do you have some outputs of broken VASP runs ready? If not, I would move this to an issue and merge this anyway.
If I remember correctly, and please correct me if I am wrong @pmrv, this change by @samwaseda was based on the fact that when jobs are ABORTED (for some reason), it is not collected. When one tries to restart another job from this aborted job, this fix would automatically collect the job?
Essentially there are two cases:
(1) implies that it would be possible to parse the output. This was done in the fix by @samwaseda. But this fix broke restart
in case (2). This PR fixes that in turn. So we would need two test cases, one that is possible to be parsed, so any finished calculation would do, likely even the ones that we already ship with the test suite. And another one that contains any aborted VASP run as long as our parser cannot parse it. This test case should then check that it is possible to call restart
on such a job.
Please never upload POTCAR
files, those are licensed by VASP and we are not allowed to share those.
I've rebased to rewrite the commit that introduced a POTCAR, this way it won't even be accessible in the git history.
@samwaseda I've noticed that your original fix doesn't change the job status after the collect. Is that intended? I don't care either way, I just want to confirm.
@jan-janssen Imo this is done now and we can move ahead with the next version.
Currently this breaks restarting any vasp job that did not timeout but aborted due to something else.
I still need to double check that this catches the correct error.