Fix bug where positive progress is displayed, even when the reality is an infinite crash loop caused by NanException thrown before Checkpoint 1.
Example: save_every = 1000, NanException is thrown around iteration 700. Steps will keep increasing to the thousands and progress bar displays the training getting closer to completion, but the model never gets close to converging because of the infinite crash loop before Checkpoint 1.
Fix bug where positive progress is displayed, even when the reality is an infinite crash loop caused by NanException thrown before Checkpoint 1.
Example: save_every = 1000, NanException is thrown around iteration 700. Steps will keep increasing to the thousands and progress bar displays the training getting closer to completion, but the model never gets close to converging because of the infinite crash loop before Checkpoint 1.