Why not share the first BN and ReLU?

liuzhuang13 / DenseNet

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

BSD 3-Clause "New" or "Revised" License

4.69k stars 1.06k forks source link

Why not share the first BN and ReLU? #52

Closed Sunnydreamrain closed 5 years ago

Sunnydreamrain commented 5 years ago

Hi,

The features go through BN-ReLU-Conv-BN-ReLU-Conv , then concatenate the features from different layers. Since BN is applied to each channel, and ReLU applies element-wisely. Why not share the first BN-ReLU? The features go through Conv-BN-ReLU-Conv-BN-ReLU, then concatenate the output of ReLU features? Is their any difference?

Thanks.

robertomest commented 5 years ago

There was some discussion on this topic in this issue. By preactivating each layer has its own BatchNorm parameters which seem to improve results on deeper networks.

Sunnydreamrain commented 5 years ago

Okay. I see. Thanks a lot.