Open harsanyika opened 6 years ago
Hi,
Batch normalization was initialized from ImageNet where possible, 0s (mean) and 1s (variance) for the new layers. Momentum was 0,9 and epsilon 1e-5. So standard values.
I think, too, that there could be a problem with how batch normalization behaves in training and testing. As I've mentioned before, we only converted the model to TensorFlow for inference. I want to replace the batch norm with tf.layers.batch_normalization and try to see if that version trains properly.
Hi. @harsanyika Have you trained the network yourself? I am also doing such job, but it did not convergence. I am so sorry but can you put your code on GitHub or something else? I want to train it again as reference! Thank you very much!
Hello,
I am trying to build my own network in TF, based on your results. I know for sure that I built the correct model because I am able to load the weights you provided and it runs perfectly for testing. However, I have problems with training the network initialized with ImageNet weights as well as finetuning from the weights provided by you. It seems like the problem might be the way I handle batch normalization. The testing and training error seems to differ when the batch normalization layers are part of the network (maybe there is a problem with how it handles the pop.mean and pop.variance?). I use the build in batch_normalization from Tensorflow (https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization).
What parameters did you use for batch normalization during training? Did you initialized all the means and variances with 0s and 1s in the new layers and used the pre-trained ImageNet weights on the rest? How did you choose the momentum and epsilon (and maybe other params) for the bn layers?