Open zouxinh opened 6 years ago
There are some papers reference to feature normalize such as "normface" or "L2-constrained softmax loss".
The feature normalization will lead to difficulty in training. You have to rescale the normalized feature with a parameter $s$ in order to make the network converge, which will introduce one more hyperparameter $s$. So we did not consider to use it when developping SphereFace.
hello, there are some paper normalize the features before input Softmax layer, and I think it's make sense because (1)it will make the feature space more dense which suit for hyperplane to do classification. (2)all feature be normalized will give larger gradients to hard samples.
I also think these ideas don't conflict with sphereface which depend on angular margin. Do i think it right? and have you did such experiment?