Results from default config

bkj commented 7 years ago

Hi --

What are the results (eg LFW accuracy) that we should expect when running this code as described in the README? I don't have MATLAB so I'll need to run w/ a different face detection/alignment pipeline, so I want to see how much error is introduced by my variant.

Thanks

wy1iu commented 7 years ago

This is a good point. We will provide the results and the pretrained models for your reference shortly.

bkj commented 7 years ago

Great thanks -- I'm running now and will post results from my config when done.

bkj commented 7 years ago

I ran the code as is, with a couple of small modifications:

182x182 images extracted using MTCNN (per https://github.com/davidsandberg/facenet/wiki/Classifier-training-of-inception-resnet-v1)
batch size of 80 instead of 128 (GPU running out of memory, maybe because of larger images)
had to implement my own get_list.py (see https://gist.github.com/bkj/aff978999e974c6e7b71d972551e1f0c)

So the only line of code I changed in the repo is the batch_size in the model.prototxt.

The loss during training looks like this:

Does that seem remotely correct? Doesn't really look like the shape I'd expect.. Haven't had time to run the LFW benchmark yet.

Also, another odd behavior: about 2/3 of the time, when I start training the model the loss goes up to 87 and stays there (I think the weights are turning to nans). I have to restart the model repeatedly until it makes it through the first few minutes, then it will keep going.

Jason-Zhou-JC commented 7 years ago

Hi, I ran the code with default config and get an accuracy of 0.620667 on LFW. (The softmax loss of sphereface_model_iter_28000.caffemodel is about 2.3 ) Is that what it should be? Or there is something I should pay attention to?

bkj commented 7 years ago

Must be an error. How did you preprocess/test/etc? Using all original MATLAB?

Jason-Zhou-JC commented 7 years ago

Yes, I just followed the README and ran the code using MATLAB. Now I am trying with more Iterations. What's your LFW accuracy?

bkj commented 7 years ago

Haven't had a chance to run LFW since I don't have MATLAB. But the loss was about 8.0 at convergence and didn't really look right (see above) so I'm guessing it won't be great.

bkj commented 7 years ago

You ran detection/alignment with the provided code though?

Jason-Zhou-JC commented 7 years ago

Yes, and no error occurred. @bkj

bkj commented 7 years ago

Interesting. I ran training again -- this time it ran for 20K iterations, then the weights went to nan.

@wy1iu any thoughts on why training might not be working for us?

Jason-Zhou-JC commented 7 years ago

@wy1iu , I think there might be two small mistakes in the file evaluation.m.

The img at line 88 should be converted to single precision, like this:

img = imread(file);
img = single(img); % convert to single precision

pairs(i).group = ceil(i / 300); at line 74 should be changed to pairs(i).group = ceil(i / 600);.

bkj commented 7 years ago

Do fixing those mistakes get the model up to the > 99% accuracy reported in the paper?

Jason-Zhou-JC commented 7 years ago

No, but it is over 93% now.

Jason-Zhou-JC commented 7 years ago

@wy1iu ，there is another small mistake in the file preprocess/code/face_detect_demo.m.

areas = prod(bboxes(:, 3:4), 2); at line 70 should be changed to

areas = prod(bboxes(:, 3:4)-bboxes(:, 1:2), 2);.

Now, I can get an accuracy over 98%. However, if I replace the LFW data with that preprocessed by my friend, which also uses the MTCNN to detect faces, the accuracy increase to > 99%. So, I guess there might still be some minor problems which I can not find out. Could you please check it out for us?

wy1iu commented 7 years ago

Yes, I find some bugs in the pipeline demo and will fix them shortly. But the SphereFace recognition part should be fine.

bkj commented 7 years ago

@wy1iu Are you able to post a plot of the training loss when the model is running correctly?

bkj commented 7 years ago

@Jason-Zhou-JC are you able to post a plot of your loss as well? The best accuracy I'm getting is ~0.97 ATM

Jason-Zhou-JC commented 7 years ago

sphereface

bkj commented 7 years ago

Fantastic -- thanks!

zhly0 commented 7 years ago

hi @bkj @Jason-Zhou-JC ,what's your final value of lambda?I use the default setting,and the final lambda is 5,but the softmax_loss is around 5,and it can't be smaller,have you changed the value of lambda?the network prototxt is sphere_model.prototxt the accuracy I get is 95.1%,validate rate is 70%,AUC is 0.6,

Jason-Zhou-JC commented 7 years ago

I didn't change the setting, you can refer to my log file. sphereface.txt

zhly0 commented 7 years ago

@Jason-Zhou-JC thanks for answering,the difference I can tell is I use casia 112x112(data is cleaned by casia_clean_list),not 96x112,and my batch size is smaller:128,maybe batch size account. And there is a small question: convolution layer in the original network has bias term,but it all does't learn at all since the learning rate is 0,and it initial to 0,why don't just set bias_term:false?

Jason-Zhou-JC commented 7 years ago

@zhly0 You can simply set the bias_term to false if the learning rate of its bias term is 0. They are the same, I think. But you should notice that not all of them in the original network are 0.

goodluckcwl commented 7 years ago

I trained on CASIA-Webface and the average accuracy on LFW is 99.18%.

bkj commented 7 years ago

@goodluckcwl Using the default settings? Could you post your log file?

goodluckcwl commented 7 years ago

@bkj Actually, the LFW accuracy is only 98.8%(with PCA) if I used the default settings. I modified lambda, kept training with a small learning rate(0.01) and found the accuracy improved. Carefully fine-tuning do improve the performance. The model I trained can be downloaded from here: https://github.com/goodluckcwl/Sphereface-model

bkj commented 7 years ago

@goodluckcwl How exactly did you do the PCA? I saw that mentioned in the center loss paper, but wasn't sure exactly how it was being done.

goodluckcwl commented 7 years ago

@bkj For 10-fold cross validation, each time utilizes 9 folds for the training of PCA and one fold for testing. Here are codes you can refer to: https://github.com/happynear/FaceVerification

bkj commented 7 years ago

@goodluckcwl Thanks -- which file in particular is the relevant one?. And could you give a little summary of what you mean by "train PCA"?

zhly0 commented 7 years ago

@bkj，hi bkj，have you figure out why the loss did not went down？Since your lfw accuracy has 97%，is it solvered?

goodluckcwl commented 7 years ago

@bkj 9 folds is used for the training of PCA. Please refer to https://github.com/happynear/FaceVerification/blob/master/lfwPCA.m

zhly0 commented 7 years ago

@jason-Zhou-JC.yes,not all of them is zeros,my mistake.

bkj commented 7 years ago

@zhly0 I think that was an issue w/ some grayscale images mixed in w/ the color images -- I re-detected the faces and haven't had that problem recently.

zhly0 commented 7 years ago

@bkj thanks for answering,I change my batch size to 256,and crop the image to 96x112,then it did not happen,but I when I just use softmax without A-softmax to train(with 96x112 image),the loss went to 87.3365 when the loss is about 3.7,and maybe related to the gray image in the training data,it just happens at some point

ggjy commented 7 years ago

@Jason-Zhou-JC Can you upload the LFW data with that preprocessed by your friend in which the accuracy increase to > 99%? Thanks.

zhouhui1992 commented 7 years ago

@bkj how did you deal with the grayscale images mixed in the color images? i met the same loss issue

wy1iu commented 7 years ago

Hi guys, all the bugs are fixed now. We also release the 20-layer CNN model (SphereFace-20) described in the paper. The expected results and pretrained model are also updated. You are welcome to try!

wy1iu commented 7 years ago

The pipeline should be okay to run without modifications now.

bkj commented 7 years ago

Is it possible to host the preprocessed CASIA/LFW faces somewhere? Understand if it's too much bandwidth, but figured I'd ask. LFW w/ faces extracted is probably small enough share I'd imagine.

I've found that the preprocessing methods for these things can have more impact than I would necessarily expect.

wy1iu commented 7 years ago

It is a good idea, but I am not sure whether we are allowed to share these images directly without official permission. @bkj

zhly0 commented 7 years ago

@wy1iu ， first of all，you really did a nice work，and I use your code to train on my own dataset improved the result！thanks for your contribution，but when testing with lfw，when I use your model update last night ，I got only 92.6% with rgb image，with the model @goodluckcwl upload got about 95% with rgb image，while using my own networks，got 96% accuracy with gray lfw image，my lfw did nothing but just crop the image to 96x112，so is the model easily influenced by the alignment method？is the lfw(not the training dataset) test dataset available？thx

wy1iu commented 7 years ago

I do not think alignment method matters that much, but of course you could try others. If you use this repo's preprocessing methods to process your raw LFW and CASIA dataset (you could use your own training set as well), I believe you will get good results. @zhly0

ydwen commented 7 years ago

@zhly0 From your results I am sure you did something not consistent with our settings. From your limited information, I cannot find the exact issue. Are you exactly following the instruction on both training and testing data? Which evaluation script did you use?

ydwen commented 7 years ago

@bkj In your case, the loss is not decreasing because you change the batch size. We do suggest you use 512. That's the reason why @Jason-Zhou-JC can get a correct loss curve.

@Jason-Zhou-JC The reason of getting low accuracy is that there are still bugs in old evaluation script. Now it has been fixed.

zhly0 commented 7 years ago

@ydwen thanks for reply，I use the facenet‘s lfw testing code，just modify a little to adjust to caffe，the model is from this repo you update last night，I ask for your lfw image because I do not have matlab，and I want to verify if this is due to the alignment or other staff like：my lfw test code setting。my training data is casia alignment by mtcnn

bkj commented 7 years ago

@ydwen The batch size in the current provided configuration is 256 -- is that the correct batch size to use? Also, are you able to post your sphereface_train.log so we can see your configuration and training loss exactly?

EDIT: Seems like this code is intended to run on GPUs, w/ a batch size of 256 on each, for an "effective batch size" of 512

wy1iu / sphereface

Results from default config #3