Closed Gaelium closed 2 years ago
Hi @Gaelium! There are two types of resuming training from checkpoint:
If you wish to initialize the encoder weights from a previous checkpoint (such as the official checkpoints), you can do so by specifing the --checkpoint_path
flag and point it to the pretrained checkpoint.
In order to resume training from a specific training step state, the optimizer, global step, discriminators, and best loss value should be kept as part of the checkpoint, which is done by providing the --save_training_data
flag.
By default, this behaviour is disabled due to the large size of each resulting checkpoint.
However, in case you do run a new training session with the --save_training_data
flag, you can continue from a saved checkpoint using the --resume_training_from_ckpt
flag.
Hope it helps, Best, Omer
Thank you! Adding the --save_training_data
flag let me resume from the checkpoint.
I'm having a similar problem as https://github.com/omertov/encoder4editing/issues/22#issue-869018928. The issue is closed, but no solution has been posted.
Traceback (most recent call last): File "scripts/train.py", line 88, in <module> main() File "scripts/train.py", line 28, in main coach = Coach(opts, previous_train_ckpt) File "./training/coach.py", line 87, in __init__ self.load_from_train_checkpoint(prev_train_checkpoint) File "./training/coach.py", line 93, in load_from_train_checkpoint self.best_val_loss = ckpt['best_val_loss'] KeyError: 'best_val_loss'
Can anyone let me know if there is a way to resolve this issue? Thanks!