Open qiulesun opened 7 years ago
If you use batch-normalization, then training is not too sensitive to the initialization; i.e., with batch-normalization, any / all of the above methods should work.
Thank you for your answer.I use the batch-normalization,but the top1err decreased only from 0.99 to 0.91 after 5 epochs, and the convergence rate was slow.Is this normal?
I want to train the vgg-vd-16 model from scratch, which parameter initialization method should I choose, gaussian or xavier or xavierimproved?