Closed hnguyen0428 closed 10 months ago
The CLI trainer supports this two different ways:
"checkpoint_path"
key at the top level of the model config JSON. This will cause training to restart using the weights contained in the checkpoint. This calls the LightningModule.load_from_checkpoint()
class method to initialize the model."ckpt_path"
key at the top level of the learning config JSON. This will cause the training to start from the state in the checkpoint (model weights and optimizer state variables) using the Lightning Trainer's checkpointing functionalityThese are essentially thin wrappers around Lightning's checkpointing functionality, so have a look there for more info.
I'm going to close this since the solutino already is implemented; otherwise you may want to check out #210.
I would love to be able to continue training where the last model was left off. I always find it frustrating when I train for a certain amount of epochs, look at the logs and realize that the model has not yet converged. It would be cool to be able to continue training a model by loading in a checkpoint file or loading in a nam model, and continue to train where the last best epoch was left off instead of having to restart training from epoch 0.