Closed bharris47 closed 5 years ago
Could there be some issue with running 3 texts per training example through the BatchNormalization layers? If my batch size is 128, could Keras/Tensorflow be summing the BatchNormalization inputs from (3 * 128) samples but only dividing by the 128 training examples?
I'm facing similar issue for multi-stream image classification if my branches contain trainable BN layers.
@bharris47 Did you found a solution for your issue?
I did not find a solution to this particular issue. I ended up using a different loss function and only needed to run one batch through the model per step which worked around the issue.
@AZweifels and @bharris47 @fchollet. I have the same issue. When using multi-label outputs, the trainable BN layers got nan moving_mean. I have try mobilenetv2, xception, inception-resnet-v2. Is it a keras bug?
Try to use tf.layers.batch_normalization(x, training=self.is_training, renorm=True), with renorm=True.
We'll close this issue and track the bug with #11927.
[x] Check that you are up-to-date with the master branch of Keras. You can update with: pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[x] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[x] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with: pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
Hi, I am running into an issue where my BatchNormalization moving_mean parameters quickly explode to infinity following Conv1D layers. I have check the output of the Conv1D layers and the moving mean seems to be simply incorrect. I am using the Adam optimizer and
clipnorm
andclipvalue
seem to have no effect.I am able to reproduce the issue on two separate machines on a Titan V and a Titan Xp.
I have tried moving BatchNormalization after activation, before pooling, after pooling. Nothing seems to help.
The output of this model is fed into a triplet model for training.
I added a
Callback
which logs the BatchNormalization statistics after each batch. Here are some sample logs of the BatchNormalization weights.