Closed bkj closed 7 years ago
This is a good point. We will provide the results and the pretrained models for your reference shortly.
Great thanks -- I'm running now and will post results from my config when done.
I ran the code as is, with a couple of small modifications:
get_list.py
(see https://gist.github.com/bkj/aff978999e974c6e7b71d972551e1f0c)So the only line of code I changed in the repo is the batch_size
in the model.prototxt
.
The loss during training looks like this:
Does that seem remotely correct? Doesn't really look like the shape I'd expect.. Haven't had time to run the LFW benchmark yet.
Also, another odd behavior: about 2/3 of the time, when I start training the model the loss goes up to 87 and stays there (I think the weights are turning to nans). I have to restart the model repeatedly until it makes it through the first few minutes, then it will keep going.
Hi, I ran the code with default config and get an accuracy of 0.620667 on LFW. (The softmax loss of sphereface_model_iter_28000.caffemodel is about 2.3 ) Is that what it should be? Or there is something I should pay attention to?
Must be an error. How did you preprocess/test/etc? Using all original MATLAB?
Yes, I just followed the README and ran the code using MATLAB. Now I am trying with more Iterations. What's your LFW accuracy?
Haven't had a chance to run LFW since I don't have MATLAB. But the loss was about 8.0 at convergence and didn't really look right (see above) so I'm guessing it won't be great.
You ran detection/alignment with the provided code though?
Yes, and no error occurred. @bkj
Interesting. I ran training again -- this time it ran for 20K iterations, then the weights went to nan.
@wy1iu any thoughts on why training might not be working for us?
@wy1iu , I think there might be two small mistakes in the file evaluation.m
.
The img
at line 88 should be converted to single precision, like this:
img = imread(file);
img = single(img); % convert to single precision
pairs(i).group = ceil(i / 300);
at line 74 should be changed to pairs(i).group = ceil(i / 600);
.
Do fixing those mistakes get the model up to the > 99% accuracy reported in the paper?
No, but it is over 93% now.
@wy1iu ,there is another small mistake in the file preprocess/code/face_detect_demo.m
.
areas = prod(bboxes(:, 3:4), 2);
at line 70 should be changed to
areas = prod(bboxes(:, 3:4)-bboxes(:, 1:2), 2);
.
Now, I can get an accuracy over 98%. However, if I replace the LFW data with that preprocessed by my friend, which also uses the MTCNN to detect faces, the accuracy increase to > 99%. So, I guess there might still be some minor problems which I can not find out. Could you please check it out for us?
Yes, I find some bugs in the pipeline demo and will fix them shortly. But the SphereFace recognition part should be fine.
@wy1iu Are you able to post a plot of the training loss when the model is running correctly?
@Jason-Zhou-JC are you able to post a plot of your loss as well? The best accuracy I'm getting is ~0.97 ATM
Fantastic -- thanks!
hi @bkj @Jason-Zhou-JC ,what's your final value of lambda?I use the default setting,and the final lambda is 5,but the softmax_loss is around 5,and it can't be smaller,have you changed the value of lambda?the network prototxt is sphere_model.prototxt the accuracy I get is 95.1%,validate rate is 70%,AUC is 0.6,
I didn't change the setting, you can refer to my log file. sphereface.txt
@Jason-Zhou-JC thanks for answering,the difference I can tell is I use casia 112x112(data is cleaned by casia_clean_list),not 96x112,and my batch size is smaller:128,maybe batch size account. And there is a small question: convolution layer in the original network has bias term,but it all does't learn at all since the learning rate is 0,and it initial to 0,why don't just set bias_term:false?
@zhly0 You can simply set the bias_term
to false
if the learning rate of its bias term is 0. They are the same, I think. But you should notice that not all of them in the original network are 0.
I trained on CASIA-Webface and the average accuracy on LFW is 99.18%.
@goodluckcwl Using the default settings? Could you post your log file?
@bkj Actually, the LFW accuracy is only 98.8%(with PCA) if I used the default settings. I modified lambda, kept training with a small learning rate(0.01) and found the accuracy improved. Carefully fine-tuning do improve the performance. The model I trained can be downloaded from here: https://github.com/goodluckcwl/Sphereface-model
@goodluckcwl How exactly did you do the PCA? I saw that mentioned in the center loss paper, but wasn't sure exactly how it was being done.
@bkj For 10-fold cross validation, each time utilizes 9 folds for the training of PCA and one fold for testing. Here are codes you can refer to: https://github.com/happynear/FaceVerification
@goodluckcwl Thanks -- which file in particular is the relevant one?. And could you give a little summary of what you mean by "train PCA"?
@bkj,hi bkj,have you figure out why the loss did not went down?Since your lfw accuracy has 97%,is it solvered?
@bkj 9 folds is used for the training of PCA. Please refer to https://github.com/happynear/FaceVerification/blob/master/lfwPCA.m
@jason-Zhou-JC.yes,not all of them is zeros,my mistake.
@zhly0 I think that was an issue w/ some grayscale images mixed in w/ the color images -- I re-detected the faces and haven't had that problem recently.
@bkj thanks for answering,I change my batch size to 256,and crop the image to 96x112,then it did not happen,but I when I just use softmax without A-softmax to train(with 96x112 image),the loss went to 87.3365 when the loss is about 3.7,and maybe related to the gray image in the training data,it just happens at some point
@Jason-Zhou-JC Can you upload the LFW data with that preprocessed by your friend in which the accuracy increase to > 99%? Thanks.
@bkj how did you deal with the grayscale images mixed in the color images? i met the same loss issue
Hi guys, all the bugs are fixed now. We also release the 20-layer CNN model (SphereFace-20) described in the paper. The expected results and pretrained model are also updated. You are welcome to try!
The pipeline should be okay to run without modifications now.
Is it possible to host the preprocessed CASIA/LFW faces somewhere? Understand if it's too much bandwidth, but figured I'd ask. LFW w/ faces extracted is probably small enough share I'd imagine.
I've found that the preprocessing methods for these things can have more impact than I would necessarily expect.
It is a good idea, but I am not sure whether we are allowed to share these images directly without official permission. @bkj
@wy1iu , first of all,you really did a nice work,and I use your code to train on my own dataset improved the result!thanks for your contribution,but when testing with lfw,when I use your model update last night ,I got only 92.6% with rgb image,with the model @goodluckcwl upload got about 95% with rgb image,while using my own networks,got 96% accuracy with gray lfw image,my lfw did nothing but just crop the image to 96x112,so is the model easily influenced by the alignment method?is the lfw(not the training dataset) test dataset available?thx
I do not think alignment method matters that much, but of course you could try others. If you use this repo's preprocessing methods to process your raw LFW and CASIA dataset (you could use your own training set as well), I believe you will get good results. @zhly0
@zhly0 From your results I am sure you did something not consistent with our settings. From your limited information, I cannot find the exact issue. Are you exactly following the instruction on both training and testing data? Which evaluation script did you use?
@bkj In your case, the loss is not decreasing because you change the batch size. We do suggest you use 512. That's the reason why @Jason-Zhou-JC can get a correct loss curve.
@Jason-Zhou-JC The reason of getting low accuracy is that there are still bugs in old evaluation script. Now it has been fixed.
@ydwen thanks for reply,I use the facenet‘s lfw testing code,just modify a little to adjust to caffe,the model is from this repo you update last night,I ask for your lfw image because I do not have matlab,and I want to verify if this is due to the alignment or other staff like:my lfw test code setting。my training data is casia alignment by mtcnn
@ydwen The batch size in the current provided configuration is 256 -- is that the correct batch size to use? Also, are you able to post your sphereface_train.log
so we can see your configuration and training loss exactly?
EDIT: Seems like this code is intended to run on GPUs, w/ a batch size of 256 on each, for an "effective batch size" of 512
Hi --
What are the results (eg LFW accuracy) that we should expect when running this code as described in the README? I don't have MATLAB so I'll need to run w/ a different face detection/alignment pipeline, so I want to see how much error is introduced by my variant.
Thanks