I was investigating why some of our relatively simple Keras models (mostly Efficientnet-like) fail to converge after being upgraded from Keras 2 to 3. Some minor tweaks made them converge (e.g. lowering the learning rate), but I was curious about finding the underlying issue since I found no relevant documentation or release notes that would explain the training discrepancy.
Minimal reproducible example
I set up two environments in Google Colab and tried to train exactly the same model with all the random generators seeded.
What I observed: I get training discrepancy when I add BatchNormalization layer.
Background
I was investigating why some of our relatively simple Keras models (mostly Efficientnet-like) fail to converge after being upgraded from Keras 2 to 3. Some minor tweaks made them converge (e.g. lowering the learning rate), but I was curious about finding the underlying issue since I found no relevant documentation or release notes that would explain the training discrepancy.
Minimal reproducible example
I set up two environments in Google Colab and tried to train exactly the same model with all the random generators seeded. What I observed: I get training discrepancy when I add
BatchNormalization
layer.Keras 2 / TF 2.14.1 notebook: https://colab.research.google.com/drive/1f7q-VcW7ugRPNxbCLkuE-q0O1WUJ8q-R?usp=sharing Keras 3 / TF 2.17.0 notebook: https://colab.research.google.com/drive/1ONPJ_WXM6WQoJ8ze9bJJ9tjS94aNJ4KB?usp=sharing
Please let me know if I can do some additional experiments to track down the issue.