Closed hawkjk closed 6 years ago
I try densenet121 base on your code for FR project, but it seems that the Conv structure ( BN+ReLU+Conv) is not work for me.
Do you mean that the network is not converged? I didn't train from scratch. But in transfer learning, it's ok in my case.
I modify the Conv structure to Conv+BN+ReLU, the training is ok but the accuracy is lower.
In densenet structure, pre-activation batch-normalization(BN+ReLU+Conv) is important, because each layers can apply a unique scale and bias to previous feature. For more detail, reference "Memory efficient Implementation of DenseNes", figure 2.
So I try to modify the Conv structure to BN+ReLU+Conv+BN+ReLU, it seems that the training is ok and the accuracy is better than above two structure.
Do you repeat BN+ReLU+Conv+BN+ReLU? I think the post-activation BN is redundant. And I also have no idea why BN-ReLU-Conv-BN-ReLU structure is better than BN-ReLU-Conv structure.
Hello pudae: