huangyangyu / NoiseFace

Noise-Tolerant Paradigm for Training Face Recognition CNNs [Official, CVPR 2019]
https://arxiv.org/pdf/1903.10357.pdf
MIT License
136 stars 23 forks source link

train with arcface #4

Open guangdongliang opened 5 years ago

guangdongliang commented 5 years ago

Thank you for the paper and the source code! when trained with noise_tolerant+(arcface-loss)(margin = 0.5) , the loss and accuracy(nearly 0) did not converge ? However, when the margin was zero, the training precess is ok. I wonder which margin did you use for the experiment !
In my experiment, the pipline is resnet101 -->fullconnected-layer --> noise_tolerant --> arcface-loss .

huangyangyu commented 5 years ago

Thank you for attention.

Our method is an angular-based method. In your pipeline, please make sure the output of fullconnected-layer is the cosine value(In other words, the input and weights of fc layer both are normalized). You can use the debugging information to debug.

In our training process, we firstly trained the L2-softmax model called M, then trained the arcface model by using corresponding M as pretrained model.

L2-softmax structure can be referred to: L2-softmax deploy file Arcface structure can be referred to: arcface deploy file

I suggest you use our r20 to do experiment, then replace the r20 with r100.

Hope you can get success. If you reproduce the result, please let us known.

guangdongliang commented 5 years ago

Thank you for attention.

Our method is an angular-based method. In your pipeline, please make sure the output of fullconnected-layer is the cosine value(In other words, the input and weights of fc layer both are normalized). You can use the debugging information to debug.

In our training process, we firstly trained the L2-softmax model called M, then trained the arcface model by using corresponding M as pretrained model.

L2-softmax structure can be referred to: L2-softmax deploy file Arcface structure can be referred to: arcface deploy file

I suggest you use our r20 to do experiment, then replace the r20 with r100.

Hope you can get success. If you reproduce the result, please let us known.

Thank you for the reply. I did not notice that there was a "resnet20_arcface_train.prototxt" file in the project. My pipeline is not the same as pipeline in this file because my noise_tolerant_layer is in front of arcface-loss. I will try your pipeline and put the result here. By the way, I reproduced your noise_tolerant_layer in mxnet and trained in mxnet.

guangdongliang commented 5 years ago

@huangyangyu After I finish the training process with "resnet100 -->fullconnected-layer --> arcface_layer --> noise_tolerant --> softmax" pipeline, the performance of this model is worse than training directly with arcface. 1) using arcface model(r100, margin=0.5) as pretrained model 2) starting lr is 0.1 3) 8 x Tesla V100 ,each with a single K=250 x 220(the batch-size for 8 GPU is 8 x 220) 4) training with mxnet 5) MS1M-ArcFace (85K ids/5.8M images) The histogram of scores at the end of my training process is as below. I am wondering why there is only one peak. Maybe its my fault and I will try to work it out. image

huangyangyu commented 5 years ago

@guangdongliang Thank you for sharing your experiment firstly.

As your training setting shows,

  1. It is not suitable to use the model trained by normal method as pretrained model.

In our training process, we firstly trained the L2-softmax model called M through our method, then trained the arcface model by using corresponding M as pretrained model.

  1. As far as I know, MS1M-ArcFace is a highly clean dataset, so the improvement is uncertain.

You may want to imrove your result in clean dataset, but our method have more improvement in noisy-dataset .

If you have time, could you please verify the effectiveness of the method in the following experiments?

  1. reproduce our experiment under the same network and the same noisy-dataset as describe in paper
  2. your base network + L2-softmax(without pretrained model) in noisy-webface
  3. your base network + Arcface(with pretrained model) in noisy-webface
  4. your base network + Arcface(with pretrained model) in your dataset

2 first, if 2 ok then 34, else 1.

maryhh commented 5 years ago

@guangdongliang hello,would you like to share your code?cause,i also reproduce the code in mxnet recently,but i'm not sure it works.

guangdongliang commented 5 years ago

using r34 as backbone, the weight training with ms1m changes like this, and training with vgg changes like this. There may be some problems.

guangdongliang commented 5 years ago

@guangdongliang hello,would you like to share your code?cause,i also reproduce the code in mxnet recently,but i'm not sure it works.

I am very sorry! If I put the code of noise_tolerant layer here, there will be some trouble from my company waiting for me.

as for the detail: Using mx.operator.CustomOp to define your noise_tolerant layer, and initializing a big ndarray to store the histograms of recent batches. In the forward function, do the calculations with functions from mxnet.ndarray in GPU as much as possible.

Hope you can get success.

jeroneandrews-sony commented 5 years ago

@huangyangyu when using the pre-trained model M (L2 softmax) to initialise the ArcFace model, what learning rate do you use for ArcFace training and do you use different learning rates for different layers?