pcentieiro / midian

41 stars 11 forks source link

Require some code explanation #2

Open gyphorz opened 12 years ago

gyphorz commented 12 years ago

Hi Pedro,

I'm sorry if this is a trivial question but would really appreciate some help. Looking at the midian project, which part of the code specifies the 'test' image that will be compared against the images within the database?. Furthermore, could you provide me with a brief explanation of 'predicted' and 'actual' class? (or possibly links to other sources that i could read to be more informed). Would really appreciate your help on this,

Alex

bytefish commented 12 years ago

Without knowing every detail of Pedro's implementation. midian used an early version of libfacerec, the library which is by now merged into OpenCV 2.4.2 (probably the FaceRecognizer is also in the iOS build). You could read the OpenCV documentation on the module here:

The interface changed a little, but it shouldn't be too hard to adapt to it.

In midian the prediction happens in the predict method, the interface is:

And then there is the implementation for each algorithm, like for the Eigenfaces:

Thanks again to Pedro for helping me a lot back then, to make the library more robust. :)

gyphorz commented 12 years ago

Hi bytefish,

Many thanks for your response. In regards to Pedro's code, when compiling the application it returns :

Predicted Class :3 Actual Class :3

I am wondering why, in specifics to his code there are only two functions implemented (posted code below), and from what i gather the 'mat src' loads in the test subjects. But what i cannot figure out is the part where it specifies which image (jpg) will be compared against the database subjects to search for a match. Maybe you could have a glance please?

bytefish commented 12 years ago

First of all the image data is loaded here:

This adds the images into a vector and the labels accordingly. From this data you'll get a sample you want to predict, the test image is then removed from the training data (so they do not overlap):

Then the face recognition model is learnt:

Once this model is learnt, you'll predict the test sample we got above and try to predict the label. Think of the label as the subject (the person) this image belongs to:

pcentieiro commented 12 years ago

Hello Philipp and gyphorz :)

You're right, the code that specifies which image is compared against the database subjects to search for a match, is the (IBAction)faceRecognition:

https://github.com/pcentieiro/midian/blob/master/Face%20Recognition%20Library/Face%20Recognition%20Library/ViewController.mm#L45

Regarding the meaning of "Predicted Class :3 and Actual Class :3", here is the explanation:

There are 4 different persons (4 classes) on the database:

So, what the algorithm does is the following: it loads all the images into an array (vector images) and the classes number into vector labels. Thus, we get 12 different images on "images" vector, and 12 entries on "labels" with the following numbers: 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3 (which are the classes numbers aka persons). Then, we delete the last entry on the "images" vector and the "labels" vector, and we use that image to compare against the database (the "images" vector). Since the image is not on the vector, we try to guess which person the image belongs to (if number/class 0, 1, 2 or 3).

Finally, as you may already guessed, "Predicted Class: 3" means that the algorithm predicted the image as being the person number 3, and the "Actual Class: 3" specifics which person the image belongs to in reality!

If you still have any doubts, ask it, I'll be here :)

Thanks for the feedback!

PS: Yes, the code is not very clear, I will fix it on a later update :)

gyphorz commented 12 years ago

Hi Pedro and bytefish,

Thank you both for the replies. Apologise for my lack of knowledge but i have just started to look into OpenCV and face recognition and the amount of information seems a bit overwhelming but i have started reading and started to understand parts of it. That being said, i do have a few more queries ;

  1. Is it possible to have only 1 image per class? how would this affect the outcome?
  2. Why do you delete the last entry. Assuming that the 'image' and 'label' vector both contain '1,2,3'. You delete the last entry, making it '1,2'. You then use '3' as the input image, and then you search the database '1,2', using '3'. It should not return a match? (i'm obviously wrong).
  3. Could you please show me a code snippet on how it would look like, if i want to specify a particular 'input' image (.jpg), that would be compared against the database?. Would this be the part of the code that needs to be modified? (also, why is the image.size() -1).

    // get test instances Mat testSample = images[images.size()-1]; int testLabel = labels[labels.size() - 1];

Many thanks again for your help!

pcentieiro commented 12 years ago
  1. Is it possible to have only 1 image per class? how would this affect the outcome?

Yes! That would mean that you only have a photo of a person. That results in less photos to match against the database, resulting in less accuracy. So, the algorithm suggestions would not be the best. More photos, more accuracy (since it has more photos to compare with)!

  1. Why do you delete the last entry.

I deleted the last entry just for testing purposes. I could remove ANY image to query the database.

Assuming that the 'image' and 'label' vector both contain '1,2,3'. You delete the last entry, making it '1,2'.

Now they both contain 11 images, and 11 numbers respectively. labels = {0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3} (notice that now there is only 2 images belonging to person number 3, instead of the usual 3 images per person).

You then use '3' as the input image, and then you search the database '1,2', using '3'. It should not return a match? (i'm obviously wrong).

Yes, you are wrong :) Notice that there are still images belonging to person number 3 on the "images" and "labels" vector. If you want to see what is happening, do the following after line 161 on facerec.cpp:

NSLog(@"Class Number: %d | Distance: %f", sampleIdx, dist);

This will show the euclidean distance from the INPUT image to ANY image on the database. The image in the DB that has the minimum distance to the input image, will be selected as the "nearest" photo, and therefore that will be the person that we're trying to recognize :)

  1. Could you please show me a code snippet on how it would look like, if i want to specify a particular 'input' image (.jpg), that would be compared against the database?. Would this part of the code need to be modified? (also, why is the image.size() -1).

Comment everything from line 64 to line 79 (on faceRecognition method) and write the following:

Mat inputImage = [self CreateIplImageFromUIImage:[UIImage imageNamed:@"YOURIMAGE.jpg"]]; Mat grayInputImage; cv::cvtColor(inputImage, grayInputImage, CV_BGR2GRAY)

Fisherfaces model(images, labels);

int predicted = model.predict(grayInputImage); cout << "predicted class = " << predicted << endl;

This will show what is predicted person (class), based on your input image.

I didn't tested it, but it should work ;)

But... if I remember correctly you need to have the SAME width/height on all the images (DB and input image).

(also, why is the image.size() -1).

Just to get the last image/class of the vector.

Pedro

On Jul 23, 2012, at 11:06 PM, gyphorz wrote:

Hi Pedro and bytefish,

Thank you both for the replies. Apologise for my lack of knowledge but i have just started to look into OpenCV and face recognition and the amount of information seems a bit overwhelming but i have started reading and started to understand parts of it. That being said, i do have a few more queries ;

  1. Is it possible to have only 1 image per class? how would this affect the outcome?
  2. Why do you delete the last entry. Assuming that the 'image' and 'label' vector both contain '1,2,3'. You delete the last entry, making it '1,2'. You then use '3' as the input image, and then you search the database '1,2', using '3'. It should not return a match? (i'm obviously wrong).
  3. Could you please show me a code snippet on how it would look like, if i want to specify a particular 'input' image (.jpg), that would be compared against the database?. Would this part of the code need to be modified? (also, why is the image.size() -1).

    // get test instances Mat testSample = images[images.size()-1]; int testLabel = labels[labels.size() - 1];

Many thanks again for your help!


Reply to this email directly or view it on GitHub: https://github.com/pcentieiro/midian/issues/2#issuecomment-7193049

gyphorz commented 12 years ago

Hi Pedro,

Many thanks for your help, i really do appreciate it and i'm starting to understand this a lot better!. Yes, the code does work :). I do have one last question and wanted to know your advice in approaching this problem . Assuming that you take a photo with your iphone device and want to cross check against a remote database, what would be the best approach in terms of memory and speed?.

  1. Would you use a BLOB to store the images in a remote database or a file path system?
  2. When performing the check, would you download the image from the database onto the iphone, store it as catche and then perform the check?. (i would imagine that this could be done in bulks, e.g. download 20 users, check, if no match the next 20 users).

What approach would you take?. Thank you

pcentieiro commented 12 years ago

Np :)

Assuming that you take a photo with your iphone device and want to cross check against a remote database, what would be the best approach in terms of memory and speed?.

Well, if you can send the image to the remote database, do the following:

1) take the picture 2) run a face detection algorithm (iOS 5 already has native APIs) 3) crop the faces from the picture (so you don't need to send the WHOLE picture) 4) resize the faces to a low size (optional, just to save data if you need to) 5) send them to the remote database 6) receive the response

Otherwise yeah, you need to download all the images on the database. You should have a cache, in order not to download everything, everytime you need to do cross check.

  1. Would you use a BLOB to store the images in a remote database or a file path system?

Can't help you on that :) Everything that I did so far, was done locally (every photo taken was saved on the iphone, and the algorithm used those images to do face recognition). I used a file path system btw, but I don't know if a blob would be better in your scenario.

  1. When performing the check, would you download the image from the database onto the iphone, store it as catche and then perform the check?. (i would imagine that this could be done in bulks, e.g. download 20 users, check, if no match the next 20 users).

You will always have a match, because every image checked will give you a distance (which I presented on the previous post). To verify which image on DB looks the input face image the most, you will need to check for all the DB in order to know what is the image with the minimum distance. Of course that you can set a threshold, and once that threshold (a minimum distance) is reached, you don't need to check for more images!

Pedro

On Jul 24, 2012, at 12:48 AM, gyphorz wrote:

Hi Pedro,

Many thanks for your help, i really do appreciate it and i'm starting to understand this a lot better!. I do have one last question and wanted to know your advice in approaching this problem . Assuming that you take a photo with your iphone device and want to cross check against a remote database, what would be the best approach in terms of memory and speed?.

  1. Would you use a BLOB to store the images in a remote database or a file path system?
  2. When performing the check, would you download the image from the database onto the iphone, store it as catche and then perform the check?. (i would imagine that this could be done in bulks, e.g. download 20 users, check, if no match the next 20 users).

What approach would you take?. Thank you


Reply to this email directly or view it on GitHub: https://github.com/pcentieiro/midian/issues/2#issuecomment-7195102

gyphorz commented 12 years ago

Hi Pedro and Bitefish,

Thank you for the detailed answer. So i understand that there is no way to perform the face recognition process with a remote database WITHOUT actually downloading all images from the database onto the iphone?. Assuming that you have a database, would it be somehow possible to preprocess each image and store a ratio/vector value for the image. So when you have the unknown image, you only process it and then compare values vector/ratio values from the database, thus avoiding to download all images?(or is the vector/ratio value soley based on the input image). I'm trying to look at ways to avoid having to download all images on the iphone. What are your thoughts on this?

bytefish commented 12 years ago

As described in the documentation I've linked above... You can save and load face recognition models, by simply using the load and save method:

This way you would only store the model state on the device and you don't need to train on the device itself. Regarding the training: The subspace methods (PCA, LDA) need retraining each time you add an image, algorithmically there's no way around. So if you want to use these methods, you either need to compute it externally and transfer the final model or store the images on the device. Local Binary Patterns Histogram don't need to be retrained, you only need to add a method to add a single image.

pcentieiro commented 12 years ago

As Philipp mentioned, you need to transfer the final model or store the images on the device. My suggestion (if possible) would be to send the image to a server, and let it produce the outcome, and report back to you. There are some apps that do that, like http://itunes.apple.com/us/app/id484990787/id484990787?mt=8&affId=2050126&ign-mpt=uo%3D4 .

Pedro

On Jul 24, 2012, at 6:34 PM, Philipp Wagner reply@reply.github.com wrote:

I don't know if you have read the link I provided, but I doubt it. Really read the documentation on cv::FaceRecognizer, because it'll answer most of the questions.

As described in the documentation... You can save and load face recognition models, by simply using the load and save method:

This way you would only store the model state on the device and you don't need to train on the device itself. Regarding the training: The subspace methods (PCA, LDA) need retraining each time you add an image, algorithmically there's no way around. So if you want to use these methods, you either need to compute it externally and transfer the final model or store the images on the device. Local Binary Patterns Histogram don't need to be retrained, you only need to add a method to add a single image.


Reply to this email directly or view it on GitHub: https://github.com/pcentieiro/midian/issues/2#issuecomment-7214602

gyphorz commented 12 years ago

Hi,

Thank you both for your input, it is much appreciated. In regards to the LBPH method just want to know if i am in the right direction. As previously mentioned, i'm looking to avoid downloading the images to my the mobile device.

Assuming that i am using the LBPH method and that i have a remote database which contains 'id, imagePath and LocalBinaryPoint' as fields where Local Binary Point would be already pre-processed for its corresponding image. This being said, would i be able to do the following - ?

  1. Preprocess the unknown (input image) on the phone and gather the LBP.
  2. Send the input image to a web server.
  3. The web server will be able to compare the input image LBP with the rest of the images LBP from the database and perform a prediction.

This should work right? or am i missing something?

pcentieiro commented 12 years ago

I didnt tested it, but I think so.

Enviado do meu iPhone

No dia 30/07/2012, às 13:23, gyphorz reply@reply.github.com escreveu:

Hi,

Thank you both for your input, it is much appreciated. In regards to the LBPH method just want to know if i am in the right direction. As previously mentioned, i'm looking to avoid downloading the images to my the mobile device.

Assuming that i am using the LBPH method and that i have a remote database which contains 'id, imagePath and LocalBinaryPoint' as fields where Local Binary Point would be already pre-processed for its corresponding image. This being said, would i be able to do the following - ?

  1. Preprocess the unknown (input image) on the phone and gather the LBP.
  2. Send the input image to a web server.
  3. The web server will be able to compare the input image LBP with the rest of the images LBP from the database and perform a prediction.

This should work right?


Reply to this email directly or view it on GitHub: https://github.com/pcentieiro/midian/issues/2#issuecomment-7366845

bytefish commented 12 years ago

This isn't an issue, nor a code explanation. Algorithmic problems should be asked at appropriate places, such as http://dsp.stackexchange.com/ and not within a bug tracker.

seriyvolk83 commented 9 years ago

I don't know how the algorithm works, but using one of the training images as a test image is not good. Such a simple test case can even recognized by simple per pixel compare. I have added getDistance method which returns the minimal distance from the test image to one of the training images and the algorithm does not work at all. Moreover there is a bug - training images set must have at least three images. So, having the same set of images from the sample code I have checked next test cases: training images set: 2_1, 2_1, 2_3, 2_3 (each image is duplicated to deal with the bug described above) test images / distance 1_1 / 1319.04 1_2 / 1059.16 1_3 / 852.088 2_2 / 905.349 As you see, 2_2 image mush have best (minimal) distance because it is the same person as on 2_1, 2_3, but it's not. The minimal distance is for 1_3 image. I don't know what to say. Waste of time.