Use case and benchmarks

Any-Winter-4079 commented 7 months ago

This is not an issue per se, but I would like to show my results, in case my experiments are helpful to others. 👋 I am building a robot for a school project (so it's not like I gain anything from sharing) and decided to use DeepFace.

Please, take the results with a grain of salt.

Hardware and libraries

tensorflow-macos
M1 Max 64GB RAM
deepface==0.0.84 via pip

Speed to build embeddings

Taking 1024 images from the LFW Dataset, these are my results.

Number of images that 6 face recognition models (VGG-Face, Facenet, Facenet512, OpenFace, DeepID, and ArcFace) can convert into embeddings, per second, in DeepFace.

Accuracy and speed to find person in 432-people embedding

Of the 1054 images (of 432 people) that start with 'A' in the dataset, I take the last image of people with ≥4 images and move it to the testing folder (up to 30 images). That is, 1024 images for building the embedding and 30 images for testing. I use cosine similarity and the default thresholds (shown below).

| Model | Cosine | Euclidean | Euclidean L2 | |-------------|--------|-----------|--------------| | VGG-Face | 0.68 | 1.17 | 1.17 | | Facenet | 0.40 | 10 | 0.80 | | Facenet512 | 0.30 | 23.56 | 1.04 | | ArcFace | 0.68 | 4.15 | 1.13 | | Dlib | 0.07 | 0.6 | 0.4 | | SFace | 0.593 | 10.734 | 1.055 | | OpenFace | 0.10 | 0.55 | 0.55 | | DeepFace | 0.23 | 64 | 0.64 | | DeepID | 0.015 | 45 | 0.17 |

Results are the following:

Number of correct predictions (out of 30) and average time per prediction for 6 models and 8 different backends in DeepFace, with cosine similarity and default thresholds.

Noticeably, OpenFace/DeepID don't seem to work, which may be a hardware/library issue on M1.

Accuracy and speed to find (or not find) known/unknown/no person in 435-people embedding

Sometimes you want to recognize a known person but also not incorrectly mistake someone unknown for someone in your embeddings. And other times no face may be shown at all, and you also want to not label that image as someone known. For this experiment, I add Tom_Cruise, Salma_Hayek and Valentino_Rossi from the LFW dataset to our database, and use 48 test images (taken online using a screenshot, but I can share the exact images if anyone is interested to reproduce to a T):

16 images of known faces (Arnold_Schwarzenegger, Tom_Cruise, Salma_Hayek and Valentino_Rossi, 4 images each)
16 images of unknown faces (Stepth_Curry, Shaq, Emma_Roberts, Vanessa_Hudgens, 4 images each)
16 images of no people.

Images are resized to (250,250) to be the same size as those in the LFW Dataset. Images of people in the test folder are taken in 4 poses:

front-close
front-far
side-close
side-far

In the database folder, apart from adding Tom_Cruise, Salma_Hayek and Valentino_Rossi to the previous 432 people, I add 3 more images for the 4 people to be recognized, in poses: front-far, side-close and side-far (since the LFW is mostly front-close)

Results are as follows:

Number of identifications (out of 16) and non-identifications (out of 32) correct and average time per prediction for 4 models and 8 different backends in DeepFace, with cosine similarity and default thresholds.

Number of correct identifications (out of 4) for frontal/close, frontal/far, lateral/close, and lateral/far shots, for 4 models and 8 different backends in DeepFace, with cosine similarity and default thresholds.

Threshold tweaking

Playing with the threshold, you can improve results for your use case (I went from 0.68 to 0.625 for VGG-Face and cosine and results really improved, and even better that translated into real-life performance too, which was nice to see). I suspect this depends on a lot of factors, such as resolution, quality, number of images, resemblance and so on, you play around!

Speed of first iteration

Noticeably too, even though build_model speeds up the first iteration, it is still a bit slower the first time you run it. After the model is built, speed seems to pick up and remain pretty constant.

Final notes

Overall, I am happy with the result. A lengthier explanation together with the code used to run these experiments is included in the face_recognition.md I added.

While exploring DeepFace, I have seen a few Issues on GH and comments on YouTube from people talking about benchmarks (or not seeing the same results as those in a tutorial) so I hope this is useful and more people can come together in the future and benchmark for their use cases.

If this is not appropriate, feel free to move/remove.

serengil commented 7 months ago

Thank you for your contribution.

Closing this because this is not an issue. Still, set its label to documentation.

serengil commented 7 months ago

BTW, I am aware of the underperform of DeepId and OpenFace. It is not a hardware issue. Pre-trained models are problematic.

serengil / deepface