Follow your step，but only get 73% verification correct rate

Wyyghst commented 8 years ago

First, thank you for your contribution.

I follow your step, used CASIA data and 'mnist_siamese_solver.prototxt', 'CASIA_train_test.prototxt' in 'caffe_proto' folder training CNN without mean file by caffe. After 500,000 iterations, validation set accuracy rate is about 72%. Cnn seems already fitting.

Then, I write matlab code to extracting features and L2 Classification for LFW dataset. But only get 73% verification correct rate.

The matlab code is:

dr_lfw_features = [];
for i= 1:length(fileLines)
    im = imread( fileLines{i} );    
    im = rgb2gray(im);
    im = imresize(im, [100 100]);
    im = single(im);
    im = im /128.0;
    im = im';
    drop5_data = net.forward({im});
    drop5_data = reshape(drop5_data{1}, 1, 320);
    fprintf('iamge is %s, i = %d\n', fileLines{i}, i);
    dr_lfw_features = [dr_lfw_features; drop5_data];
end

load('pairlist_lfw.mat');
test_Intra = pairlist_lfw.IntraPersonPair;
test_Extra = pairlist_lfw.ExtraPersonPair;
test_pair = [test_Intra; test_Extra];

F1_index = test_pair(:,1);
F2_index = test_pair(:,2);

AllFeature1 = dr_lfw_features(F1_index, :);
AllFeature2 = dr_lfw_features(F2_index, :);

num = length(AllFeature1);
F1 = AllFeature1;
F1 = bsxfun(@rdivide, F1, sqrt(sum(F1.^2,2)));
F2 = AllFeature2;
F2 = bsxfun(@rdivide, F2, sqrt(sum(F2.^2,2)));
thresh2 = zeros(num,1);
for i = 1:num
    thresh2(i) = pdist2( F1(i,:),F2(i,:) );
end;
figure;
hist(thresh2(1:3000), 500);
figure;
hist(thresh2(3001:end), 500);

accuracies = zeros(10,1);
for i=1:10
    test_idx = [(i-1) * 300 + 1 : i*300, (i-1) * 300 + 3001 : i*300 + 3000];
    train_idx = 1:6000;
    train_idx(test_idx) = [];
    bestc=256;
    same_label = ones(6000,1);
    same_label(3001:6000) = 0;
    cmd = [' -t 0 -h 0'];
    model = svmtrain(same_label(train_idx),thresh2(train_idx),cmd);
    [class, accuracy, deci] = svmpredict(same_label(test_idx),thresh2(test_idx),model);
    accuracies(i) = accuracy(1);
end;
mean(accuracies)
cmd = [' -t 0 -h 0'];
model = svmtrain(same_label,thresh2,cmd);
[class, accuracy, deci] = svmpredict(same_label,thresh2,model);

By hist histogram, you can see the same data threshold and different data threshold obviously inseparable.

I think there are errors in this matlab code, do you have any good suggestions?

happynear commented 8 years ago

The last one, is using all the thresholds as both training set and testing set. What is the accuracy of it?

Wyyghst commented 8 years ago

thank you for your reply. I think there is a problem before the last sentence code.

accuracies = zeros(1,1);
for i=1:10
    test_idx = [(i-1) * 300 + 1 : i*300, (i-1) * 300 + 3001 : i*300 + 3000];
    train_idx = 1:6000;
    train_idx(test_idx) = [];
    bestc=256;
    same_label = ones(6000,1);
    same_label(3001:6000) = 0;
    cmd = [' -t 0 -h 0'];
    model = svmtrain(same_label(train_idx),thresh2(train_idx),cmd);
    [class, accuracy, deci] = svmpredict(same_label(test_idx),thresh2(test_idx),model);
    accuracies(i) = accuracy(1);
end;
mean(accuracies)

The 'mean(accuracies)' is only 73%. And this code is refer your code( https://github.com/happynear/FaceVerification/blob/master/lfwL2.m )

Before svm classification codes, I display L2 threshold histogram.

figure;
hist(thresh2(1:3000),500);
figure;
hist(thresh2(3001:end),500);

The picture on the left is the same person histogram threshold value, the right is different. you can see that obviously inseparable. (Horizontal axis is threshold, Vertical axis is frequency)

Since my CNN model can reach 72% test classification accuracy, I think the caffemodel is right. I used LFW data for test verification that all images aligned with deep funneling(111MB), downloaded from LFW official website without other preprocessing.

I think the problem is in processing LFW data or MATLAB feature extraction code section. Do you have any good suggestions? Thanks.

happynear commented 8 years ago

The histogram seems good. What is the accuracy of using all thresholds to train and test?

Wyyghst commented 8 years ago

I use CASIA data to train the CNN. After 500,000 iterations, validation set accuracy rate is about 72%, training set accuracy is about 90%. After, I use this model to extract features. Do you think of my caffemodel is already well for face verification?

happynear commented 8 years ago

My model's accuracy is similar with yours. I think the problem lies in the lfw test.

Wyyghst commented 8 years ago

I did not do any preprocessing for CASIA and LFW data before. I found that the human face is already the main part of the picture in CASIA data. But LFW data is not. You can see the people's shoulders in the picture. So, I do face detection for LFW data. And face verify the correct rate now is 90%.

Did you do some preprocessing for CASIA and LFW data? Just like face detection and face alignment. Thanks.

happynear commented 8 years ago

CASIA-webface has also have a version of aligned faces. Use that and you will get accuracy over 96%.

Wyyghst commented 8 years ago

I just using CASIA-webface that version of aligned faces for training CNN. But without preprocessing for LFW, I only get 72% face verification correct rate. After using Viola-Jones Objects detection for LFW data, face verification correct rate is 90%. So, I want to know do you do preprocessing for LFW data? Thanks.

Wyyghst commented 8 years ago

I found casia also provide casia-webface and lfw face aligned version. Prior to this different preprocessing for training and test data, which leads to the low correct rate. Now my experiment can reach your results. Thanks for your reply and help.

zirohut commented 8 years ago

Hi, @Wyyghst By aligned version do you mean the data in Normalized_Faces folder? I'm getting an accuracy around .73 on the CNN model and it seems to be fitting. Can this figure go higher?

Wyyghst commented 8 years ago

@zirohut Aligned version data is the data in Normalized_Faces folder. Before, I used aligned version CASIA data training CNN, but LFW data is downloaded from the official website without alignment. Due to different preprocessing for training and test data, I get 73% face verification correct. Later, I used data that all from Normalized_Faces folder, and can get 96% face verification correct. Do you encountered the same problem with me?

zirohut commented 8 years ago

@Wyyghst Oh, then I don't think it's the same problem. My problem is just on the training of the CNN model (the classification part). I used the pure softmax prototxt and CASIA (Normalized_Faces/webface folder) to train the CNN, but the accuracy for the training is now around .75 and seems difficult to increase. I haven't come to the face pairs or the verification part yet.

By the way, what's your accuracy for the classification problem when training the CNN? Cuz I don't think my .75 accuracy can extract a good feature for the verification part.

Thank you for your reply.

Wyyghst commented 8 years ago

Hi, @zirohut I also used the pure softmax prototxt and CASIA (Normalized_Faces/webface folder) to train the CNN. I used 'mnist_siamese_solver.prototxt', 'CASIA_train_test.prototxt' in 'caffe_proto' folder training CNN without mean file by caffe. one tenth of the data as a validation set, and the rest as the training set. before 36000 iterations, learning rate is 0.01, after is 0.001. After 500,000 iterations, validation set accuracy rate is about 72%, training set accuracy is about 90%. Both are referring to the classification correct rate of softmax in CNN.

You mean training set accuracy is 75%? Do you use the same solver and network definition in 'caffe_proto'?

zirohut commented 8 years ago

@Wyyghst I didn't decrease the learning rate to 0.001 after 360000 iterations. I suppose this is the problem. It seems the accuracy is now increasing. Thank you for your help!

zirohut commented 8 years ago

@Wyyghst Hi, what's your drop5_data looks like? I extracted the feature after drop5, and reshaped to (1,320). But most of each feature vector's nodes are zero. It seems only a few nodes are working and more than half the nodes are set to zero.

My features extracted from drop5 looks like this: 0 0 0 0 4.54 0 0 0 9.23 0 ... 0 0 0 0 3.68 0 0 0 7.90 0 ... .....

Each row is the 1*320 feature vector from an image and it seems many columns(or nodes) are zero. I don't think this is a good feature and the result on lfw is very very low.

BTW, how did you get the threshold for verification? Did you use half of lfw pairs to train the svm and the other half to test? What about L2 and cosine?

I'm kinda confused right now, any pointer will be helpful. Thanks!

happynear commented 8 years ago

@zirohut Yes, the feature is highly sparse, just as features extracted from a ImageNet trained model.

The threshold is determined by 10-fold validation. However, this is only one number. Thresholds got from 10-fold validation or trained from all training set are similar. In my experience, it is around 1.

You can use ReadFeatureLFW.m for feature extraction and lfwL2.m for computing the distances and evaluation.

Wyyghst commented 8 years ago

Hi, @zirohut Due to the reason of dropout, the feature is highly sparse. I used two methods to calculate the threshold. 1.The threshold is determined by 10-fold validation. Using 5400 pair samples for training and 600 pair test, then recycled. L2 and cosine are used to determine the threshold. 2.I used all training data(944612 images) to train a joint bayesian classifier, then using all 6000 pair samples for testing.

zirohut commented 8 years ago

Hi @Wyyghst @happynear Thank you for replying!

I have another question about the features. My features looks like this:

What worries me is that although the feature is sparse, they seems to be sparse on some fixed column. In the above image one row represents the 1*320 feature vector for one input image. It seems column 4,5,6,etc are always zero. And at most cases those non-zero nodes are the same, i.e. column 7,12,16,19 are almost always non-zero. Is that normal? Or there's something wrong?

And my L2 threshold histogram looks like this: I don't think this is right, since the threshold for positive and negative samples are similar.

I think my problems should lies in the feature extraction part. I have two concerns:

My L2 threshold seems to be around 0.4 instead of 1. Could this be caused by some preprocessing I messed up? Like the Mean file or not doing operations like 'J(:,:,1,j) = I/128;' in ReadFeatureLFW.m. I still haven't figure out how to make caffe to read in mean file in matlab, there seems to be some version issue. But I guess not reading in the mean wouldn't hurt much?
When I set the caffeNet, although I used gray image to train, the caffemodel seems to only accept input images with 3 channels (one of the input dimension in deploy.prototxt has to be 3, not 1). So I had to triple the gray channel to make a rgb image. But this shouldn't happen, right?

Again, thank you for helping me!

happynear commented 8 years ago

@zirohut You must make sure that the test samples have been applied the same preprocess techniques as the training samples being done.

Wyyghst commented 8 years ago

Hi, @zirohut First, the question of feature sparse on some fixed column, possible the reasons is that these features you listed are the same person, it should be similar. you can check features for diiferent person.

The other two question: 1.You should do same preprocess for CASIA and LFW. The paper author also provide casia-webface and LFW aligned version. Mean and normalization should be consistent.

If you used gray images to train CNN, you can directly modify the channel from 3 to 1 in deploy.prototxt.

zirohut commented 8 years ago

@Wyyghst

The features are from different person. In fact I scrolled down the list and the distributions are all similar.
I used the aligned version for training and testing, and I also managed to read in the mean file. So I think the preprocess should be the same.
I tried to modify the channel from 3 to 1 in the deploy.prototxt, but it gets the following error: It's like the model was trained by 3-channel images from the very beginning, but only it wasn't. The images I used to train the model are all gray.

I think the problem should be in how I extracted those features. I used model in https://github.com/AlfredXiangWu/face_verification_experiment and also got some messed up features, while using the feature extracted by the author works fine. There must be some moves or details I missed, I'll keep looking.

happynear commented 8 years ago

@zirohut The normed faces are gray. When you are creating the image database using convert_imageset.exe, there is a flag looks like --gray=true. Only with this flag, the opencv will produce 1-channel images, otherwise, 3-channel images will be created.

Another thing is the difference of the image storage in Matlab and OpenCV. When passing an image from Matlab to OpenCV(Caffe), we must

switch the RGB channel to BGR channel oder
permute the image (transposition for gray image) to row major.

You can find these convert codes in caffe/matlab/demo/classification_demo.m (color) or my_faceverification_repo/ReadFeatureLFW.m (gray).

Wyyghst commented 8 years ago

Hi, @zirohut I think you can use matlab interface to read the CASIA training data, then output the predicted label, so you can judge whether the matlab code is right by output label. And, what is your LFW validation set correct rate?

zirohut commented 8 years ago

@Wyyghst @happynear

My model works fine now. I retrain the CNN and the accuracy on LFW is now around 96.5%. I think my struggle before might be resulted from two problems. 1) When I convert the imageset I didn't use the gray option, and perhaps I adopted incorrect way to convert them to RGB in the validation process. 2) I didn't read in the mean file correctly and didn't scale down the image to [-1,1] in either training or validation.

Anyway it works now. Thank you for your reply and help!

zirohut commented 8 years ago

@Wyyghst @happynear

By the way, if I want to use this model on a different facial database, do I have to normalize and align the images exactly the way CASIA and LFW in the Nomalize_Face file does? If I don't do the exact normalization and alignment, will there be a slight decay on the accuracy or the results are just messed up? Do I have to fine-tuning the model for a new facial dataset?

Thanks!

happynear commented 8 years ago

Of course the faces should be aligned as CASIA norm faces being done. I have provided the codes for alignment, MatAlignment.cpp. You can refer to it and exploit your own codes.

Be attention, in order to do the "CASIA alignment", another fix should be added to dlib.

In dlib/image_transformations/interpolation.h, line 1902, add

if (i!=27&&i!=51) continue;

Wyyghst commented 8 years ago

Hi, happynear Does 'MatAlignment.cpp' only do a face extraction work, but didn't do facial points detection and alignment?

happynear commented 8 years ago

@Wyyghst

It use dlib to do the detection and alignment work.

Wyyghst commented 8 years ago

Hi, @happynear I tried detection and alignment on LFW images form 'Deep Convolutional Network Cascade for Facial Point Detection'. The method in this paper detection facial five points, and is different to CASIA method. This led to LFW validation correct rate dropped by about 5%.

So, I want to know whether the alignment work in 'dlib' is same as method in 'CASIA'? Thanks.

happynear commented 8 years ago

@Wyyghst I have described how to get a "CASIA alignment" above.

zirohut commented 8 years ago

Hi, @happynear . I've been trying to implement face verification in video surveillance recently. Here I encountered a problem. I'm using L2 score to see whether two faces are similar or not. But when it comes to two people both wearing glasses, especially those thick black framed glasses, the model seems to confuse and identify them as the same person, which I guess is because those glasses are significant features on one's face.

Have you encountered similar problems before? If so, have you find any good solutions?

Thank you.

happynear commented 8 years ago

@zirohut Movie stars usually don't wear glasses, so the dataset has some kind of bias. The only solution I can raise is to collect more data, especially the normal people.

PS: giving a label of whether the person in the image are wearing glasses may also help.

zirohut commented 8 years ago

@happynear Yeah, I thought so too. Thanks for the quick reply by the way.

Unfortunately most of my test subjects wear similar black framed glasses, so for now it's kinda hard to separate their identities. I'll try collect more data of the test subjects to fine-tune the model.

Also, do you suppose the Siamese-like networks and triplet loss could help?

happynear commented 8 years ago

I guess the triplet may help since Google, Baidu, VGG all use it. But I am working on another project so I haven't tested it yet.

zirohut commented 8 years ago

I'll give it a try sometime. Thanks!

MatrixPlayer commented 8 years ago

Hi， @Wyyghst @happynear I only find the unalignment face data, didn't find the aligned face data at http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html. can you give me the website of aligned face data

tiankong1993 commented 7 years ago

do you have find the aligned face data? @MatrixPlayer

lixiaohui2020 commented 7 years ago

@Wyyghst @zirohut You refer to two methods to design threshold "I used two methods to calculate the threshold. 1.The threshold is determined by 10-fold validation. Using 5400 pair samples for training and 600 pair test, then recycled. L2 and cosine are used to determine the threshold. 2.I used all training data(944612 images) to train a joint Bayesian classifier, then using all 6000 pair samples for testing." In the second method, using all training data(944612 images) to train a joint Bayesian classifier， then you can get four trained joint Bayesian parameters, I want to know how to set threshold, such as " thresh1 = min(min(Dis_train_Intra),max(Dis_train_Extra)); thresh2 = max(min(Dis_train_Intra),max(Dis_train_Extra));

CrossData = [Dis_train_Intra(Dis_train_Intra>=thresh1&Dis_train_Intra<=thresh2);Dis_train_Extra(Dis_train_Extra>=thresh1&Dis_train_Extra<=thresh2)]; thresh = mean(CrossData); " I want to know how to set the threshold, thanks!

happynear / FaceVerification

Follow your step，but only get 73% verification correct rate #11