Avoid recompiling after each epoch which resets the optimizer and loses gradients.
Since the Adam optimizer uses momentum to adjust gradient updates, each time we call model.compile it resets the optimizer and loses all the momentum information. This leads to suboptimal training.
When this updated training loop is used on the basic Fermat Quintic example in the tutorial notebooks, the result is much smoother training, as well as a lower value for the sigma and transition loss after 50 training epochs.
Updated code:
Old code (taken from cymetric/tree/main/notebooks/2.TensorFlow_models.ipynb):
Avoid recompiling after each epoch which resets the optimizer and loses gradients.
Since the Adam optimizer uses momentum to adjust gradient updates, each time we call model.compile it resets the optimizer and loses all the momentum information. This leads to suboptimal training.
When this updated training loop is used on the basic Fermat Quintic example in the tutorial notebooks, the result is much smoother training, as well as a lower value for the sigma and transition loss after 50 training epochs.
Updated code:
Old code (taken from cymetric/tree/main/notebooks/2.TensorFlow_models.ipynb):