❓ [QUESTION] Restart run

Hello,

I have a situation in which I have really huge dataset so much so that even with multiprocessing it still takes day and a half/two days to preprocess it. Now, it happened that due to the unexpected crash on the node I would like to continue training starting from the best_model.pth weights. However I would really like to avoid processing this huge dataset again.

I tried both initial_model_state / initialize_from_state and load_model_state / load_model_state

however, when I started training initial model the key for append was false so now when I try to put it to false the error is

Traceback (most recent call last): File "/home/user/.conda/envs/nequip_stress/bin/nequip-train", line 8, in sys.exit(main()) File "/home/user/.conda/envs/nequip_stress/lib/python3.10/site-packages/nequip/scripts/train.py", line 65, in main raise RuntimeError( RuntimeError: Training instance exists at /path_to_traning_dir; either set append to True or use a different root or runname

However when I start it with append equal to true I get following error

Traceback (most recent call last): File "/home/user/.conda/envs/nequip_stress/bin/nequip-train", line 8, in sys.exit(main()) File "/home/user/.conda/envs/nequip_stress/lib/python3.10/site-packages/nequip/scripts/train.py", line 74, in main trainer = restart(config) File "/home/user/.conda/envs/nequip_stress/lib/python3.10/site-packages/nequip/scripts/train.py", line 220, in restart raise ValueError( ValueError: Key "append" is different in config and the result trainer.pth file. Please double check

I guess the question is if there is a way to pass already processed dataset along with model state?

Thanks in advance on any advice, Ivan

mir-group / nequip

❓ [QUESTION] Restart run #343