prisms-center / CASMcode

First-principles statistical mechanical software for the study of multi-component crystalline solids
Other
105 stars 69 forks source link

VASP Freeze error #327

Open sjtuzhanglei opened 10 months ago

sjtuzhanglei commented 10 months ago

Dear CASM developers,

CASM seems can detect the time taken for relaxation loop, and if it beyond a certain threshold, CASM will think VASP is frozen and kill the job?

Anything I can do with it?


> Begin vasp run:
>  jobdir: /pscratch/sd/v/voz5005/RuO2/casm/211/srun64/casm/training_data/SCEL1_1_1_1_0_0_0/1/calctype.default/run.0/
>  exec: srun -n 256 -c 1 --cpu-bind=cores vasp_std
> Most recent file output (std.out): 9.042484521865845 seconds ago.
> Most recent file output (OUTCAR): 11.093905210494995 seconds ago.
> Most recent file output (OUTCAR): 9.144774436950684 seconds ago.
> Most recent file output (OUTCAR): 14.193707704544067 seconds ago.
> Most recent file output (OUTCAR): 13.24203872680664 seconds ago.
> Most recent file output (OUTCAR): 46.290996074676514 seconds ago.
> slowest_loop: 8.0395
> 5.0*slowest_loop: 40.197500000000005
> most_recent: 46.290996074676514
>  VASP is frozen, killing job
> Run complete
> Traceback (most recent call last):
>  File "<string>", line 1, in <module>
>  File "/global/homes/v/voz5005/.conda/envs/casm/lib/python3.9/site-packages/casm/vaspwrapper/relax.py", line 122, in run
>   super(Relax, self).run()
>  File "/global/homes/v/voz5005/.conda/envs/casm/lib/python3.9/site-packages/casm/vaspwrapper/vasp_calculator_base.py", line 542, in run
>   (status, task) = calculation.run()
>  File "/global/homes/v/voz5005/.conda/envs/casm/lib/python3.9/site-packages/casm/vasp/relax.py", line 334, in run
>   self.add_errdir()
>  File "/global/homes/v/voz5005/.conda/envs/casm/lib/python3.9/site-packages/casm/vasp/relax.py", line 138, in add_errdir
>   os.rename(self.rundir[-1],
> OSError: [Errno 22] Invalid argument: '/pscratch/sd/v/voz5005/RuO2/casm/211/srun64/casm/training_data/SCEL1_1_1_1_0_0_0/1/calctype.default/run.0/' -> '/pscratch/sd/v/voz5005/RuO2/casm/211/srun64/casm/training_data/SCEL1_1_1_1_0_0_0/1/calctype.default/run.0/_err.0'
>  Found errors: FreezeError
bpuchala commented 10 months ago

I don't think there is a configuration option from the calc.json file, but you can edit the conditions for which the error is detected in the function implemented here. It's in the .../site-packages/casm/vasp/error.py file.

xivh commented 10 months ago

I made this change in my branch if you want to copy it: https://github.com/xivh/CASMpython/commit/e6505fcf4746c4074a5de670824855690b63c138