juliandewit / kaggle_ndsb2017

Kaggle datascience bowl 2017
MIT License
622 stars 290 forks source link

batchnorm order in CNN #12

Open PiaoLiangHXD opened 7 years ago

PiaoLiangHXD commented 7 years ago

Thank you so much for your sharing. I have a question in batchnorm layer. In step2_train_mass_segmenter.py , from line 314, it's the architecture of 2d u-net. In each block, layers go like this: input -> batchnom -> conv1 -> relu -> conv2 -> relu -> pooling -> output In other papers, traditional batch norm layers are put between conv layers and relu, in order to avoid gradient explosion. So I wonder why you put batchnorm before conv, do you have some theories to support this order? Or is it a new trick/tip in CNN? Of course I know there's no "correct" position for every layer, and your work preforms quite well. Congratulations for the challenge!

juliandewit commented 7 years ago

Hello, First of all it was a hasty job :) Indeed I use batchnorm to stabilize the NN computation. Esp with UNet it's sometimes hard to get them going. Basically with (very) good initialization you should not need batch normalization. But as you can see in that example I did not even do subtract mean / std from the input data. I just did a batchnorm after the input layer which also works fine.

As for the location. I have something in my head that I read somewhere and that seems to makes sense.

  1. Batch norm after conv/fc layer negates the bias term. So what is the point of a bias term then.
  2. After the batch norm you have zero mean, 1 std BUT when you put a RELU behind this the mean will not be zero anymore. It will be positive since negative values will be zero after relu.

This makes sense in my head but I guess in practice it's not a big deal. As long as the network is stable and converges.

But don't take my word on anything. It's just something I did without investigating fully.