Optimizer - Githubissues

Hi, what kind of optimiser (Adam, SGD) did you you for the test in ex. CASIA-WebFace?

I'm asking because look like gradient in final layer using adacos are lower than using fixed s=20 (98 classes). Also, adacos get lower scores. I'm just thinking, that maybe to low gradient are provided for learning the model. I'm using Adam and the results are following (dataset is CARS196 )

Adam fixed: 0.79
Adam AdaCos: 0.745
Adam AdaCos x2 bigger lr: 0.755

In general AdaCos works worse for some reason, not sure why. Maybe it is also because that averaged angle for non-similar classes in smaller than in case of faces. Or we need more adaptive LR method for this problem.

xialuxi / arcface-caffe

Optimizer #46