Closed Long00River closed 3 years ago
Let's say that you want to train your model for 100 epochs and because of some issues, training crashed at 50th epoch. Now, we don't want to start training from scratch (or 0th epoch). We can use the last checkpoint to resume the training, so that next time we resume training after crashing, training starts from 50th epoch.
We use resume
flag for such purposes
Many thanks for your replying.
Hi, I don't quite understand the use of parameter resume, note is "Use this flag to load the last checkpoint for training".Can you explain their role in training?I don't understand checkpoint and resume.Thank you very much