Checkpointing before session timed out

Normally, validation is carried out after each epoch of training, and normally, model checkpoints are made by monitoring some validation metric. This implies that validation is done once per epoch, and model checkpoints are made at most once per epoch.

In pytorch lightning, Trainer.val_check_interval allows one to set how often validation is to be carried out. If it's set to 0.25, then validation is done after each quarter of an epoch of training, so validation is done a total of 4 times per epoch. This allows one to check how well the model is doing more frequently.

And, for the version of pytorch lightning currently in use, ModelCheckpoint.every_n_val_epochs should monitor the specified validation metric and decide whether to make a checkpoint or not.

qAp / gresearch_crypto_forecasting_kaggle

Checkpointing before session timed out #10