xialuxi / arcface-caffe

insightface-caffe
MIT License
279 stars 124 forks source link

Optimizer #46

Open melgor opened 4 years ago

melgor commented 4 years ago

Hi, what kind of optimiser (Adam, SGD) did you you for the test in ex. CASIA-WebFace?

I'm asking because look like gradient in final layer using adacos are lower than using fixed s=20 (98 classes). Also, adacos get lower scores. I'm just thinking, that maybe to low gradient are provided for learning the model. I'm using Adam and the results are following (dataset is CARS196 )

  1. Adam fixed: 0.79
  2. Adam AdaCos: 0.745
  3. Adam AdaCos x2 bigger lr: 0.755

In general AdaCos works worse for some reason, not sure why. Maybe it is also because that averaged angle for non-similar classes in smaller than in case of faces. Or we need more adaptive LR method for this problem.