Hi,
what kind of optimiser (Adam, SGD) did you you for the test in ex. CASIA-WebFace?
I'm asking because look like gradient in final layer using adacos are lower than using fixed s=20 (98 classes). Also, adacos get lower scores. I'm just thinking, that maybe to low gradient are provided for learning the model.
I'm using Adam and the results are following (dataset is CARS196 )
Adam fixed: 0.79
Adam AdaCos: 0.745
Adam AdaCos x2 bigger lr: 0.755
In general AdaCos works worse for some reason, not sure why. Maybe it is also because that averaged angle for non-similar classes in smaller than in case of faces.
Or we need more adaptive LR method for this problem.
Hi, what kind of optimiser (Adam, SGD) did you you for the test in ex. CASIA-WebFace?
I'm asking because look like gradient in final layer using adacos are lower than using fixed s=20 (98 classes). Also, adacos get lower scores. I'm just thinking, that maybe to low gradient are provided for learning the model. I'm using Adam and the results are following (dataset is CARS196 )
In general AdaCos works worse for some reason, not sure why. Maybe it is also because that averaged angle for non-similar classes in smaller than in case of faces. Or we need more adaptive LR method for this problem.