Why Batchnorm at the final network output?

yaoing / DAN

Official implementation of DAN

MIT License

163 stars 35 forks source link

Why Batchnorm at the final network output? #14

Closed amostayed closed 2 years ago

amostayed commented 2 years ago

Hi, Thanks for this excellent repository. Very easy to follow. A question about the implementation though: I don't often see a batchnorm layer before the softmax loss in classifier networks. Any specific reason you have it? What if you train without the last batchnorm? I had a quick check without the BN and the cross-entropy loss after the first training batch was : (on a very small private dataset with 4 classes) ~313. When I did the same with the BN the value was ~1.94, which is more typical of cross entropy loss at the beginning of the training.

yaoing commented 2 years ago

Hi, thanks for your interest! You are right, the BN put after FC layer is odd. The reason for using BN is that I found the model could'n learn very well when setting FC as the output layer. And I noticed there are several works add normalization at the end network as a trick. I just try it, and it work. This may be related to the structure of the entire AFN, and it's hard for me to explain.

amostayed commented 2 years ago

Hi, Have you considered inserting a BN after the second Linear layer in Channel Attention, or putting a second FC before the output? code snippet

yaoing commented 2 years ago

Hi, I have tried adding a second FC layer of the end of network, but it has no enhancement for total result. And I have never change the BN layer Channel Attention, the stucture is pretty stable.