Why is composite function BN-ReLU-Conv3x3 ?

liuzhuang13 / DenseNet

Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).

BSD 3-Clause "New" or "Revised" License

4.69k stars 1.06k forks source link

Why is composite function BN-ReLU-Conv3x3 ? #50

Open RDShi opened 5 years ago

RDShi commented 5 years ago

Hello,

The composite function of other models is Conv3x3-BN-ReLU. Why is DenseNet special?

Looking forward to your answer. Thanks

liuzhuang13 commented 5 years ago

Hi. This is following the preactivation design in the second ResNet paper. https://arxiv.org/abs/1603.05027

The essential difference here is that there are different scaling parameters in the BN layer in each BN-ReLU-Conv3x3. If we use BN after ReLU, every subsequent layer will be based on the same BN scaling parameters.