happynear / NormFace

NormFace: L2 HyperSphere Embedding for Face Verification, 99.21% on LFW
MIT License
369 stars 107 forks source link

Question about weight normalization in Table 2 of paper #14

Closed kalyo-zjl closed 7 years ago

kalyo-zjl commented 7 years ago

Hi @happynear , I noticed that in table 2, normalization will be the key factor to promote the performance. So I tried to replicate your experiment of feature only and weight only and both normalization. For now, I can get 99% when using feature only normalization. But, with weight only, the result is only ~97%, even much worse than No Normalization. The details about my Weights Normalization is as below.

layer { name: "id_weight_ip" type: "Parameter" top: "id_weight_ip" param { lr_mult: 1 decay_mult: 0 } parameter_param { shape { dim: 10572 dim: 512 } blob_filler { type: "gaussian_unitball" } } } layer { name: "id_weight_ip_normalize" type: "Normalize" bottom: "id_weight_ip" top: "id_weight_ip_normalize" } layer { name: "id_weight_ip_scale" type: "Scale" bottom: "id_weight_ip_normalize" top: "id_weight_ip_scale" top: "SCALE" param { lr_mult: 0 decay_mult: 0 } scale_param { num_axes: 0 filler { value: 5 } bias_term: false } }

############## softmax loss ############### layer { name: "fc6" type: "InnerProduct" bottom: "fc5" bottom: "id_weight_ip_scale" top: "fc6" inner_product_param { num_output: 10572 weight_filler { type: "xavier" } bias_term: false } } layer { name: "softmax_loss" type: "SoftmaxWithLoss" bottom: "fc6" bottom: "label" top: "softmax_loss" }

If the scale is learned, the loss always diverges since scale will grow to Nan. So I fixed the scale, varies from 1 to 15. However, none of them works well. Could you please give me some advice about the settings of weight normalization? Thanks in advance.

happynear commented 7 years ago

Training with normalizing weight only is tricky. I can train it if I fine-tune the model. When I train it from scratch, the weight will soon be Nan.

BTW, you get 99% when normalizing the feature only by using CASIA-Webface? I got worse results when I tried it.

kalyo-zjl commented 7 years ago

What do you mean by fine-tuning? fine-tuning from vanilla softmax?

Yes, I get 99% after PCA, trained on CASIA-Webface. I also add a scale layer after normalizing feature. Did you learn the scale parameter? If you fixed the scale parameter, you may get 99%.

kalyo-zjl commented 7 years ago

BTW, using your aligned LFW, I can get 99.2167% using your norm-face model. but I can only get less than 99.1% on my self-aligned LFW using the same model you provide. I use your detection and alignment code here https://github.com/happynear/FaceVerification/blob/master/dataset/general_align.m without modifying parameter settings just modifying some dataset paths.

Is there any difference with you alignment steps?

happynear commented 7 years ago

I mean fine-tuning from the center face model provided by Yandong (https://github.com/ydwen/caffe-face).

I also used this code to get aligned images, but maybe some change in parameters. I can't remember clearly what I changed. Are there any images that MTCNN can't find a face in it?

kalyo-zjl commented 7 years ago

OK, but the center face already can get 99% which I mean the feature extraction parameters are already good enough. Maybe I will try to fine-tune from vanilla softmax which can got ~98% in my experiment. I will let you know if it works or not.

Sorry to hear that. In my case, all the images from LFW can find a face. BTW, there are roughly 2000+ images from CASIA-webface which can't find a face, I just exclude it from training set.

kalyo-zjl commented 7 years ago

It seems doesn't work well with only weight normalization. After finetuning from vanilla softmax, the best result I can get is only ~98.2%, no much performance gain from original model.

happynear commented 7 years ago

Ah, my results are similar, it is no better than original model, but also no less than.

happynear commented 7 years ago

But in my experiment, normalizing feature only would lead to worse results than original model ( trained by softmax + center loss).

kalyo-zjl commented 7 years ago

O.., OK. Besides, I get 99% when using feature normalization only, and it is trained from scratch, not finetuning.

happynear commented 7 years ago

That is also a impressive result. I can also get 99% using C-contrastive loss from scratch, but fail to get good results using normalized softmax loss.