happynear / AMSoftmax

A simple yet effective loss function for face verification.
MIT License
485 stars 130 forks source link

how to set the m when the feature without norm? #6

Closed Allan9977 closed 6 years ago

Allan9977 commented 6 years ago

Hi, your paper shows the result of AM-Softmax w/o FN with the m = 0.35 and 0.4.
(1).with FN : Fai = s * (cos(theta) - m) s=30, m=0.35

prototxt

layer { name: "fc6_l2" type: "InnerProduct" bottom: "norm1" top: "fc6" param { lr_mult: 1 } inner_product_param{ num_output: 10516 normalize: true weight_filler { type: "xavier" } bias_term: false } } layer { name: "label_specific_margin" type: "LabelSpecificAdd" bottom: "fc6" bottom: "label" top: "fc6_margin" label_specific_add_param { bias: -0.35 } } layer { name: "fc6_margin_scale" type: "Scale" bottom: "fc6_margin" top: "fc6_margin_scale" param { lr_mult: 0 decay_mult: 0 } scale_param { filler{ type: "constant" value: 30 } } } layer { name: "softmax_loss" type: "SoftmaxWithLoss" bottom: "fc6_margin_scale" bottom: "label" top: "softmax_loss" loss_weight: 1 }

(2).w/o FN : s not needed, Fai = ||x|| * cos(theta) - m, still use m = 0.35?

prototxt

layer { name: "fc6_l2" type: "InnerProduct" bottom: "norm1" top: "fc6" param { lr_mult: 1 } inner_product_param{ num_output: 10516 normalize: false weight_filler { type: "xavier" } bias_term: false } } layer { name: "label_specific_margin" type: "LabelSpecificAdd" bottom: "fc6" bottom: "label" top: "fc6_margin" label_specific_add_param { bias: -0.35 } } layer { name: "softmax_loss" type: "SoftmaxWithLoss" bottom: "fc6_margin" bottom: "label" top: "softmax_loss" loss_weight: 1 }

Can you show your prototxt and trainning log? thx.

happynear commented 6 years ago

When we don't use the feature normalization, it is still necessary to use annealing strategy to set the m like the lambda in SphereFace. So I need to change the code to add the annealing strategy. I will give an example tomorrow.

happynear commented 6 years ago

The example is uploaded. You also need to update the label_specific_add_layer to get the annealing codes.

https://github.com/happynear/AMSoftmax/blob/master/prototxt/face_train_test_wo_fn.prototxt

Note that I am still running the experiment to reproduce the result, so the prototxt may be changed in the next a few hours...

By the way, I can get similar results on LFW BLUFR with normalization by finetuning the network using scale 60 and margin 0.4 after iteration 16000. Maybe you can also try this way.

Allan9977 commented 6 years ago

Thanks a lot! I will have a try.