batchnorm order in CNN - Githubissues

juliandewit / kaggle_ndsb2017

Kaggle datascience bowl 2017

MIT License

622 stars 290 forks source link

Hello, First of all it was a hasty job :) Indeed I use batchnorm to stabilize the NN computation. Esp with UNet it's sometimes hard to get them going. Basically with (very) good initialization you should not need batch normalization. But as you can see in that example I did not even do subtract mean / std from the input data. I just did a batchnorm after the input layer which also works fine.

As for the location. I have something in my head that I read somewhere and that seems to makes sense.

Batch norm after conv/fc layer negates the bias term. So what is the point of a bias term then.
After the batch norm you have zero mean, 1 std BUT when you put a RELU behind this the mean will not be zero anymore. It will be positive since negative values will be zero after relu.

This makes sense in my head but I guess in practice it's not a big deal. As long as the network is stable and converges.

But don't take my word on anything. It's just something I did without investigating fully.

juliandewit / kaggle_ndsb2017

batchnorm order in CNN #12