Closed mrgloom closed 8 years ago
Not sure it can. BatchNorm layer gives you a lot of advantages, including regularization, solves problem of internal covariate shift and diminishing gradients which are very important once you go deeper. It also helps to train faster, with bigger learning rates, which is necessary when you are training such deep networks. More details are presented in Batchnorm paper.
Should ResNet work without Batch Norm layer?