question about feature normalize

wy1iu / sphereface

Implementation for <SphereFace: Deep Hypersphere Embedding for Face Recognition> in CVPR'17.

MIT License

1.58k stars 543 forks source link

question about feature normalize #43

Open zouxinh opened 6 years ago

zouxinh commented 6 years ago

hello, there are some paper normalize the features before input Softmax layer, and I think it's make sense because (1)it will make the feature space more dense which suit for hyperplane to do classification. (2)all feature be normalized will give larger gradients to hard samples.

I also think these ideas don't conflict with sphereface which depend on angular margin. Do i think it right? and have you did such experiment?

xizi commented 6 years ago

There are some papers reference to feature normalize such as "normface" or "L2-constrained softmax loss".

wy1iu commented 6 years ago

The feature normalization will lead to difficulty in training. You have to rescale the normalized feature with a parameter $s$ in order to make the network converge, which will introduce one more hyperparameter $s$. So we did not consider to use it when developping SphereFace.