Open RDShi opened 5 years ago
Hi. This is following the preactivation design in the second ResNet paper. https://arxiv.org/abs/1603.05027
The essential difference here is that there are different scaling parameters in the BN layer in each BN-ReLU-Conv3x3. If we use BN after ReLU, every subsequent layer will be based on the same BN scaling parameters.
Hello,
The composite function of other models is Conv3x3-BN-ReLU. Why is DenseNet special?
Looking forward to your answer. Thanks