Closed steindor closed 4 years ago
@steindor if you re-run the script again with more epochs. It will automatically restore the last weights. Use Custom training loop always.
Yes I noticed that was possible. I presume that is only possible when the stored weights are in memory?
I'm wondering if its possible with a new session, that is training for an arbitrary number of epochs, restarting the session and load the weights from the saved checkpoints file?
E.g. if the script crashes while pretraining from scratch so one doesn't have to start from the beginning?
@steindor I hope this is what you're suggesting: Save the weights after every fixed no.of steps other than saving at the end of each epoch. So that if training script crashes, it can continue from the latest checkpoint other than the last epoch checkpoint
Yes, I guess that would solve it. Would be great to be able to use the checkpoints though since they are already generated. Thanks!
After training a model for some epochs, how can I restore it and continue training from the checkpoints outputted as they are not in the hdf5 format?