Closed XuyangBai closed 4 years ago
The normalization is only done in order to avoid numerical issues (overflow) when computing the exponential - when using an architecture with BatchNorm, I don't think the normalization would be necessary. However, in this version of VGG, the activations can be quite large (I have seen up to a few thousands).
The normalization is not important when computing the channel selection score because it wouldn't make any difference ((A / MAX) / (B / MAX) = A / B
).
Thank you, that makes sense to me :)
Hi, Thanks for your sharing. In your paper, the definition of soft local max score is
But in your implementation I found
which means that you first normalize the feature vector by dividing the max of the whole image. But I am not clear why this normalization should be done and why you does not do this kind of normalization when calculaing the channel selection score ?