Clarification on Align, Embed and Verify

domitix commented 1 year ago

Hi, thanks for this amazing repo! I am using it for the task of face verification, but I don't really get the differences between the different types of prediction: verify, embed and align. Shouldn't the alignment be performed before the prediction? Why is it performed in a different task and just left there? Another question is about the embed and verify, what's the difference between them? If the verification task is just a cosine similarity between two image embeddings, why are they two different tasks that lead to two different results? Also on the demo there are Cosine similarity of Face Representation Embeddings and Cosine similarity of Face Verification Embeddings with different results, but I cannot understand what's happening in the code. Thanks in advance!

tomas-gajarsky commented 1 year ago

Hi, thank you for your interest in the repo and for your questions! I'll be happy to explain the differences between the various tasks and their purposes in this library.

Align predictor: The align predictor applies a Face Alignment model to predict an embedding, which is then processed by the align utilizer to produce the final 3D facial landmarks and head pose angles. This information is useful for various tasks, such as facial animation, AR effects, or measuring facial expressions.

Verify and Embed tasks: You're right that the verify task and the embed task both involve extracting face embeddings. However, their purposes are different: a. Verify task: The goal of the verify task is to obtain a face embedding that can be used specifically for face verification/recognition. The extracted embeddings are optimized for distinguishing between different individuals and measuring the similarity between their faces. b. Embed task: The embed task, on the other hand, is designed to extract a more universal facial representation. These embeddings can be used for a wider range of applications, such as clustering, classification, or generating new faces.

Regarding the demo results, the Cosine similarity of Face Representation Embeddings and Cosine similarity of Face Verification Embeddings refer to the cosine similarities between the embeddings generated by the embed and verify tasks, respectively. These similarities may be different because the embeddings are optimized for different purposes, as explained above.

I have a question for you: Did you understand align as the process which centers faces for face recognition?

domitix commented 1 year ago

thank you so much! now it's so much clearer to me. Yes, I intended align as face alignment, i.e. the process to align the faces before the face verification task in order to have better results. For this reason, I expected the alignment was performed before the verify and not as an independent task.

tomas-gajarsky commented 1 year ago

I see, that alignment is not part of the code at the moment. It is on my list of things that could be added in the future. I'll explore the feasibility of utilising the 5 facial landmarks from RetinaFace detector for aligning the faces for the face verification task.

tomas-gajarsky / facetorch

Clarification on Align, Embed and Verify #39