Closed TengliEd closed 6 years ago
Hello, If you are talking about means during data normalization - this is another parameters that calculated only once. If we are considering mean and std at the batch norm - they are updated every training step, based on training flag settings. So everything should be done under the hood. Of course, it's possible that tensorflow API was changed a little bit - so code should be rechecked.
@ikhlestov No, I mean that you have to explicitly update moving mean and moving variance in Batch Normalization before applying gradients on trainable variables. Something is like below:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
Thank you for your attention, I'll rewrite code
@TengliEd Oh, actually I remember this case - I just set updates_collections=None to force the updates in place, so there is no reason to call update with control_dependencies
@ikhlestov Yes, thanks for giving me another workaround to update them. But I still got confused about puting weight decay on all trainable variables. l2_loss = tf.add_n([tf.nn.l2_loss(var) for var in tf.trainable_variables()])
. You know we need to decay only weights not bias.
As far as I know tensorflow.nn.conv2d layer doesn't contain bias. And the whole network based on that layers - so actually there is no biases. In any case the training results show nearly the same performance as in the authors implementation and paper - so there is no reason to tweak anything. It you want - you can make a fork and implement any updates.
When using batchnorm, we need to update moving mean and moving variance as the TensorFlow document says. But I have not found it in your code. Is it wrong?