Open SJHNJU opened 4 years ago
Purpose of arcface to train network learn embedding features, enhance the intra-class compactness and inter-class discrepancy. Output is embedding vector. I think should remove the margin when using the model to predict. m is an additive angular margin penalty, in paper arcface "we add an additive angular margin penalty m between xi and Wyi to simultaneously enhance the intra-class compactness and inter-class discrepancy". m is a penalty value and only apply between embedding feature and groundtruth class feature center. Its role to force thelta become smaller. cos(thelta + m) < cos(thelta). Experience, when i use cos(thelta + m) i meet val acc very low compare with softmax. When remove m, model converge quickly and val acc close to softmax.
when using cosface / arcface as metric, is it right to train the model with margin and remove the margin when using the model to predict?
For example, if I have a trained model with [arcface]. Should I use the plain [cos(thelta)] instead of cos(thelta + m) to let the model to predict?