So I was training my model and it was going well, but I decided to stop training after a while and save the checkpoint to my google drive. Then I restarted training from the checkpoint with overwrite=true, and it started acting weirdly. The loss suddenly dropped by 0.1 to 0.2, and then after a while the loss started going up and down and the samples for the model started outputting a bunch of repetitive gibberish (random characters).
Is restoring from 'latest' and overwriting the proper way to continue training a model on the same dataset?
So I was training my model and it was going well, but I decided to stop training after a while and save the checkpoint to my google drive. Then I restarted training from the checkpoint with overwrite=true, and it started acting weirdly. The loss suddenly dropped by 0.1 to 0.2, and then after a while the loss started going up and down and the samples for the model started outputting a bunch of repetitive gibberish (random characters).
Is restoring from 'latest' and overwriting the proper way to continue training a model on the same dataset?