serengil / deepface

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
https://www.youtube.com/watch?v=WnUVYQP4h44&list=PLsS_1RYmYQQFdWqxQggXHynP1rqaYXv_E&index=1
MIT License
13.65k stars 2.14k forks source link

[suggestion] Check face preprocessing step for all the keras face embedding models #131

Closed iamrishab closed 3 years ago

iamrishab commented 3 years ago

In this repo, the face patch preprocessing step is same for all the models.

img = cv2.resize(img, target_size)
img_pixels = image.img_to_array(img)
img_pixels = np.expand_dims(img_pixels, axis = 0)
img_pixels /= 255 #normalize input in [0, 1]

Please look into it if this is valid for all, as the inference preprocessing step must match the model's training pipeline for accurate result. For example, the original Facenet repo has done it differently.

serengil commented 3 years ago

I do not understand what you mean.

iamrishab commented 3 years ago

I do not understand what you mean.

I have updated the questions. Kindly let me know if it requires any further clarifications.

serengil commented 3 years ago

How FaceNet does it? I could not find in that bulk file.

iamrishab commented 3 years ago

How FaceNet does it? I could not find in that bulk file.

I had referred to it in my question as well. Please find it here.

serengil commented 3 years ago

Here, author of the facenet declared that prewhiten makes training easier but we are not interested in training in deepface. I do not believe adding this will increase the model accuracy.

iamrishab commented 3 years ago

@serengil Thank you for your interest in researching it. I was trying to point out that if we do not follow the same preprocessing step used in training the model, we will not get the expected result while inference. Suppose we train the model with just dividing the pixel values by 255 but while testing (or inference) we are not performing that operation, we cannot we expect similar results as per our validation set. The pre and post-processing steps during model inference must be the same. Still open to anyone's feedback on this from the community. Thank you.

trevorgribble commented 3 years ago

@iamrishab @serengil I am very interested in this topic and would like to discuss more about refining the pre-processing steps as well as the threshold for decisions between True and False verification.

Specifically, I've been working on creating face detection/recognition pipeline using MTCNN for detection and both VGG-Face and Facenet for calculating embeddings.

@serengil, please correct me if I'm wrong, but I believe that the True/False thresholds you have hard-coded in distance.py (and discussed at length on this post: https://sefiks.com/2020/05/22/fine-tuning-the-threshold-in-face-recognition/ ) were all decided upon from normalizing the RGB pixel values dividing by 255 (as @iamrishab mentions).

I ran some tests against 20 different people (all with between 6-30 images per person) using the default "/255" method as RGB input. I ran with both Euclidean Distance and Cosine options.

I found there was a lot of error both false positives and false negatives.

I tried to refine and found that Facenet model was originally trained on dataset where all images were normalized by calculating img RGB mean and img RGB std and then img = (img-mean)/std

When performing this preprocess calculation (rather than simply dividing by 255 before calculating embeddings), my TRUE/FALSE verification accuracy got much better, using euclidean distance and given threshold of 10) - awesome!

Then onto VGG-Face, I found that simply dividing by 255 and using given thresholds for cosine/euclidean distance (.4 (cosine) / .55 (euclidean)) were giving me tons of false positives, and all true positives seemed to be way below the given thresholds - typically scoring at .2 or lower.

Again, I researched VGG-Face as much as I could to try and figure out how the model was trained and found some posts where it seemed that they subtracted individual R, G, and B means. (R - 131.0912, G - 103.8827 B - 91.4953) with no division, and fed those values into model: (https://github.com/ox-vgg/vgg_face2/issues/17 ) I also saw ambiguous information that the VGG-FACE model might have taken images in as BGR rather than RGB: (https://github.com/rcmalli/keras-vggface/issues/62)

So I attempted 6 different permutations of preprocessing the RGB values before sending them to embeddings, to see if there was a clearer delineation for threshold calculations. Sadly, whether I used cosine or euclidean comparisons, I couldn't dial in the VGG-Face threshold like I seemed to accomplish with Facenet.

@serengil , I do believe deepface shines brightest if I stick with the "/255" (or any other normalization technique for that matter) and create embeddings for hundreds of thousands of people, eventually filling up a database with millions of embeddings, and then do a face recognition search of an unknown face against those millions of embeddings, that the lowest cosine variation or lowest euclidean distance will indeed have a high likelihood of matching my unknown image.

However I'd like to explore the True/False ("head to head") verification some more to try and dial in the "best embeddings" for each model.

I also noticed in your earlier blogs, you were using a keras preprocess_input function to normalize VGG faces before passing them in: https://sefiks.com/2018/08/06/deep-face-recognition-with-keras/ - referencing the preprocess_input function in this repo file: https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py But currently in deepface repo it doesn't seem we are still using that tool? In functions.py we are including "from tensorflow.keras.applications.imagenet_utils import preprocess_input" but doesn't appear we are calling it anywhere?

I'm sure there are still complex elements that I'm failing to fully understand (Was this VGGFace model trained on RESNET50 or "vgg16" or "senet50" or something else? Where did the vgg_face_weights.h5 file come from?) but yes let's open conversation so we can fully understand it because it seems there is confusion in many threads around the internet.

Thanks much, sorry for the long post.

serengil commented 3 years ago

@trevorgribble thank you for this informative message. I reopened this issue.

It seems that especially in facenet preprocessing increases the model accuracy dramatically based on your experiments. Right?

Do you have the code snippet for preprocessing?

iamrishab commented 3 years ago

For reference, pls follow this link. Ref

The following is valid for model trained on VGGFace2 dataset.

The below preprocessing script is used during facenet 2017 model training which gived 128d embeddings. The model is trained with criteria of minimizing euclidian distance between two vectors.

def prewhiten(x:np.ndarray):
      """Normalzing the face patch"""
      mean = np.mean(x)
      std = np.std(x)
      std_adj = np.maximum(std, 1.0 / np.sqrt(x.size))
      y = np.multiply(np.subtract(x, mean), 1 / std_adj)
      return y

For facenet 2018 model which gives 512d embeddings. Criteria is minimizing cosine distance between two vectors.

def standardization(x:np.ndarray):
    """Standard the face patch normalization"""
    return (np.float32(x) - 127.5) / 128.0
trevorgribble commented 3 years ago

@serengil

I made a handful of code changes to suit my personal use cases, and basically removed the code of chunk:

img_pixels = image.img_to_array(img)    
img_pixels = np.expand_dims(img_pixels, axis = 0)   
img_pixels /= 255 #normalize input in [0, 1]

out of preprocess_face and moved it into DeepFace.py represent() function after the call to preprocess_face.

I added a new variable called "normalization" to represent() to try different techniques for VGG-Face and look at output variations in cosine/euclidean to see if there's a difference. (I haven't taken this step as far as you did in the "refining the threshold" post, mapping all the graphs, but I would imagine this would be the next step.)

Note I also removed the line "img_pixels = image.img_to_array(img)" altogether throughout the code base because I couldn't figure out why we needed to use that extra dimension of array.

def represent(img_path, model_name = 'VGG-Face', model = None, normalization="v1", enforce_detection = False, detector_backend = 'mtcnn', align = True):
...
...
...

img = img.astype('float64')

#Facenet case works well (as @iamrishab points out this aligns with the prewhiten function for the 128 embeddings)
if(model_name=='Facenet'):
                img = np.expand_dims(img, axis = 0)
                mean, std = img.mean(), img.std()
                img = (img-mean)/std
#Still trying to figure out optimal VGG-Face preprocess steps (I added a normalization variable that I passed in to try and compare different styles of preprocessing based on my above post, but still haven't come across optimal case. Perhaps we can collaborate in detail)
elif (model_name=='VGG-Face'):
        if (normalization == "v1"):
            #BGR mean subtraction / 255 normalization
            img[..., 0]-= 131.0912
            img[..., 1] -= 103.8827
            img[..., 2] -= 91.4953
            img = img[..., ::-1]
            img /=255
        elif(normalization =="v2"):
            #RGB mean subtraction / 255 normalization
            img[..., 0]-= 131.0912
            img[..., 1] -= 103.8827
            img[..., 2] -= 91.4953
            img /=255
        elif(normalization =="v3"):
            #BGR mean subtraction normalization
            img[..., 0]-= 131.0912
            img[..., 1] -= 103.8827
            img[..., 2] -= 91.4953
            img = img[..., ::-1]
        elif(normalization =="v4"):
            #RGB mean subtraction normalization
            img[..., 0]-= 131.0912
            img[..., 1] -= 103.8827
            img[..., 2] -= 91.4953
        elif(normalization=="v5"):
            # simply / 255 (as was case in original @serengil code)
            img /=255
        elif(normalization=="v6"):
            # simply / 127.5 - 1 (similar to facenet 2018 model preprocessing step as @iamrishab posted)
            img /= 127.5
            img -= 1
else:  (for all other models besides Facenet and VGG-Face, I'm keeping original code as I haven't gotten into it yet)
        img = np.expand_dims(img, axis = 0)
        img /= 255 #normalize input in [0, 1]
serengil commented 3 years ago

Thank you a lot. I will move your logic in the next release.

trevorgribble commented 3 years ago

@iamrishab between my experiements and confirming with your link above, it appears we have a very good lead on the preprocess/normalization steps before passing faces into the Facenet 2017 model that is included in deepface (with 128d embeddings) (which as you linked above was trained on the VGG2Face Dataset).

trevorgribble commented 3 years ago

@serengil You're very welcome. Thanks for all your work on this repo and all associated blog posts and videos. Image preprocessing consistency across both model training and model inference has always been the most confusing and challenging stage of all deep learning tools I have investigated, and I've seen accuracy improve tremendously when it all lines up....thus my dedication to making sure we truly get this as accurate as possible and eliminate any ambiguities if possible.

As I dive more and more into researching how VGGFace was preprocessed before training, I note there are 2 different models: VGGFace and VGGFace2. To be sure, is your VGGFace.py file creating the VGGFace or the more recent VGGFace2 model in keras? https://github.com/serengil/deepface/blob/024fe6843fd12eea63f06ac315ac9c4cd98c0094/deepface/basemodels/VGGFace.py Info on both models available here: https://machinelearningmastery.com/how-to-perform-face-recognition-with-vggface2-convolutional-neural-network-in-keras/

My thinking is original VGGFace was trained on 2622 people and VGGFace2 was trained on 9131 people, so the model we are using here aligns with VGGFace (original).

serengil commented 3 years ago

the output layer of the vgg-face model has 2622 nodes, so, it was trained on vggface data set instead of vggface2

davedgd commented 3 years ago

the output layer of the vgg-face model has 2622 nodes, so, it was trained on vggface data set instead of vggface2

This is something I was also curious about: would it be possible to implement a VGGFace2 model as well? In addition, it might be helpful in the documentation to clarify and/or cite papers for each implementation to better understand exactly what deepface is providing.

Independent of this, thank you @serengil for providing this extremely convenient interface to these models!

DashHax commented 3 years ago

the output layer of the vgg-face model has 2622 nodes, so, it was trained on vggface data set instead of vggface2

This is something I was also curious about: would it be possible to implement a VGGFace2 model as well? In addition, it might be helpful in the documentation to clarify and/or cite papers for each implementation to better understand exactly what deepface is providing.

Independent of this, thank you @serengil for providing this extremely convenient interface to these models!

It would be possible, by retraining the same model with the VGGFace2 dataset. But right now the original dataset has been taken down by the author.

Luckily, a pretrained Keras model trained on the dataset was made available by the author here : https://drive.google.com/file/d/1AHVpuB24lKAqNyRRjhX7ABlEor6ByZlS/view

alikhan555 commented 3 years ago

Does facenet not require RGB? You convert image BGR2RGB before pass to MTCNN and RatinaFace but why not for Facenet? In my case I observe facenet gives better results when I pass RGB image instead of BGR.

davidwdw commented 3 years ago

@trevorgribble I actually have a question on the normalization part for VGGface. I know that this is what was recommended by the original authors (you or @serengil were simply following the recommendations in their work), but I don't see the logic behind subtracting the dataset RGB mean from the data in the inference stage. I might be missing something, but the normalization step seems to suggest the assumption that the dataset where the model was trained is representative of the population distribution. Thus, for inferences of unseen images, we need to normalize them by the original dataset mean. This assumption IMO does not make sense.

Still trying to figure out optimal VGG-Face preprocess steps (I added a normalization variable that I passed in to try and compare different styles of preprocessing based on my above post, but still haven't come across optimal case. Perhaps we can collaborate in detail)

elif (model_name=='VGG-Face'): if (normalization == "v1"):

BGR mean subtraction / 255 normalization

      img[..., 0]-= 131.0912
      img[..., 1] -= 103.8827
      img[..., 2] -= 91.4953
      img = img[..., ::-1]
      img /=255
  elif(normalization =="v2"):
      #RGB mean subtraction / 255 normalization
      img[..., 0]-= 131.0912
      img[..., 1] -= 103.8827
      img[..., 2] -= 91.4953
      img /=255
  elif(normalization =="v3"):
      #BGR mean subtraction normalization
      img[..., 0]-= 131.0912
      img[..., 1] -= 103.8827
      img[..., 2] -= 91.4953
      img = img[..., ::-1]
  elif(normalization =="v4"):
      #RGB mean subtraction normalization
      img[..., 0]-= 131.0912
      img[..., 1] -= 103.8827
      img[..., 2] -= 91.4953
  elif(normalization=="v5"):
      # simply / 255 (as was case in original @serengil code)
      img /=255
  elif(normalization=="v6"):
      # simply / 127.5 - 1 (similar to facenet 2018 model preprocessing step as @iamrishab posted)
      img /= 127.5
      img -= 1

else: (for all other models besides Facenet and VGG-Face, I'm keeping original code as I haven't gotten into it yet) img = np.expand_dims(img, axis = 0) img /= 255 #normalize input in [0, 1]

My thoughts on this would be to remove the normalization part where RGB / BGR means are subtracted if anyone were to re-train the VGGface model. We should just stick to normalizing the pixel to 0-1 range. I know that normalizing to -1 to 1 range has something to do with model training convergence, so another way would be to rely on the default preprocessing step by Keras: https://www.tensorflow.org/api_docs/python/tf/keras/applications/vgg16/preprocess_input, in this case, they simply divided the image pixels by 127.5 and then normalized to -1 to 1 range (from source code found here https://github.com/tensorflow/tensorflow/blob/a4dfb8d1a71385bd6d122e4f27f86dcebb96712d/tensorflow/python/keras/applications/imagenet_utils.py#L166):

if mode == 'tf':
    x /= 127.5
    x -= 1.
    return x

I'm not an expert but I'm very interested in people's thoughts on the RGB normalization for vggface. Thanks!

sadimoodi commented 3 years ago

@serengil would it be possible to use the VGGface 2 model with the 512 dimensions? the Pre- trained model is already available here https://github.com/davidsandberg/facenet i tried to replace your facenet_weights.h5 file with the new pretrained model but i ran into several errors (i changed file names in facenet.py) but seems like i am not able to read the file. any idea how to load this pre-trained model with 512 dimensions ?

serengil commented 3 years ago

@sadimoodi sure but could you open a custom issue about this. We can follow it there.

sadimoodi commented 3 years ago

@serengil was preprocessing improved according to this post in the new release?

serengil commented 3 years ago

@sadimoodi no, I'm going to inform here and close this post when it is fixed.

Akila-Ayanthi commented 3 years ago

@trevorgribble thank you for this informative message. I reopened this issue.

It seems that especially in facenet preprocessing increases the model accuracy dramatically based on your experiments. Right?

Do you have the code snippet for preprocessing?

Is it included in the current release?

serengil commented 3 years ago

not yet @Akila-Ayanthi

Akila-Ayanthi commented 3 years ago

Just out of curiosity, will it take a long time? I want to use this for my research.

serengil commented 3 years ago

I focused on some other issues and that's why this issue is still alive. I plan to focus on this in the next couple of days.

davedgd commented 3 years ago

I focused on some other issues and that's why this issue is still alive. I plan to focus on this in the next couple of days.

Thanks @serengil for all your amazing work!

Akila-Ayanthi commented 3 years ago

Okay, looking forward to these changes. Thank you @serengil.

nbhupendra commented 3 years ago

Thank you a lot. I will move your logic in the next release.

Is this done?

serengil commented 3 years ago

@nbhupendra I'm going to close this issue when it is done.

serengil commented 3 years ago

deepface 0.0.66 is on live!

I added a normalization argument in verify, find and represent function. Its default value is base and it will work as is in this case.

If you set Facenet, v1, v2, ... to normalization, it will apply the logic @trevorgribble shared.

@trevorgribble buddy, I mentioned you in the source code as well. Thank you!

davedgd commented 3 years ago

the output layer of the vgg-face model has 2622 nodes, so, it was trained on vggface data set instead of vggface2

@serengil and @trevorgribble: If the deepface VGG-Face model is version 1, then the values being subtracted for the various normalization options (i.e., v1, v2, v3, and v4) added to version 0.0.66 are incorrect since those are from VGGFace2. The correct values for both versions can be found here in the keras-vggface repository:

https://github.com/rcmalli/keras-vggface/blob/master/keras_vggface/utils.py#L31

Note that the logic itself is slightly different than what @trevorgribble proposed. I'm currently working on testing how using the VGGFace1 values may improve the VGG-Face model results. It's worth noting that the "Facenet" normalization approach did improve the results in my testing using the lfw pairs pulled from sklearn (i.e., the "test" subset).

davedgd commented 3 years ago

I have been able to replicate the keras_vggface preprocessing using a slightly modified version of deepface. The revised version is here, and the only change is to the new normalize_input function from 0.0.66:

def normalize_input(img, normalization = 'base'):

    #issue 131 declares that some normalization techniques improves the accuracy

    if normalization == 'base':
        return img
    else: #@trevorgribble recommend the following idea (further edits by @davedgd)

        img *= 255 #restore input in scale of [0, 255] because it was normalized in scale of  [0, 1] in preprocess_face

        if normalization == 'Facenet':
            mean, std = img.mean(), img.std()
            img = (img - mean) / std

        elif normalization == 'VGGFace-v1':
            # mean subtraction based on VGGFace1 training data
            img[..., 0] -= 93.5940
            img[..., 1] -= 104.7624
            img[..., 2] -= 129.1863

        elif(normalization == 'VGGFace-v2'):
            # mean subtraction based on VGGFace2 training data
            img[..., 0] -= 91.4953
            img[..., 1] -= 103.8827
            img[..., 2] -= 131.0912

        elif(normalization== 'tf'): # normalize between -1 and 1
            # simply / 127.5 - 1 (similar to facenet 2018 model preprocessing step as @iamrishab posted)
            img /= 127.5
            img -= 1

    #-----------------------------

    return img

Additionally, I wrote some code to check this against keras_vggface. You can run this on colab (just use one of the images from deepface to test):

!pip install keras_vggface keras_applications https://github.com/davedgd/deepface/archive/refs/heads/master.zip --no-cache-dir --quiet

from keras_vggface import utils
from deepface.commons import functions

import numpy as np
from matplotlib import pylab as P

theImage = './img1.jpg'

# keras_vggface
im = functions.preprocess_face(theImage, align = False, enforce_detection = False)[0]
keras_vgg = im[..., ::-1] * 255
keras_vgg = utils.preprocess_input(keras_vgg, version=1)

# deepface (base)
im = functions.preprocess_face(theImage, align = False, enforce_detection = False)[0]
df_base = functions.normalize_input(im, normalization = 'base')

# deepface (VGGFace-v1)
im = functions.preprocess_face(theImage, align = False, enforce_detection = False)[0]
df_vgg = functions.normalize_input(im, normalization = 'VGGFace-v1')

# plots
f, ax = P.subplots(1, 4)
ax[0].imshow(functions.preprocess_face(theImage, align = False, enforce_detection = False)[0])
ax[1].imshow(keras_vgg)
ax[2].imshow(df_base)
ax[3].imshow(df_vgg)

print('keras_vggface and deepface with VGGFace-v1 preprocessing identical:', np.array_equal(keras_vgg, df_vgg))

If you run the code, you will see both packages return the same result now (i.e., identical matrices and images post-preprocessing).

In terms of performance metrics, it's a bit difficult to comment on it since these changes drastically change some of the distance_metric results (although this is primarily true for euclidean but not cosine and euclidean_l2). It will be necessary to recalculate the thresholds to see if this approach improves the result overall, but in my initial testing -- at least for cosine and euclidean_l2 -- it does not seen to harm it (e.g., VGGFace-v1 relative to base normalization)...

nbhupendra commented 3 years ago

deepface 0.0.66 is on live!

I added a normalization argument in verify, find and represent function. Its default value is base and it will work as is in this case.

If you set Facenet, v1, v2, ... to normalization, it will apply the logic @trevorgribble shared.

@trevorgribble buddy, I mentioned you in the source code as well. Thank you!

@serengil for Facenet512 which normalization has to apply? v6 or someother?

serengil commented 3 years ago

@davedgd wrapped vgg-face model was not trained on vgg-face2 dataset. So, you should you v1 normalization.

@nbhupendra, I think that it should be Facenet normalization as well.

serengil commented 3 years ago

@davidwdw, I will add vgg-face-v1 and v2 normalizations tonight in the source code and pip

davedgd commented 3 years ago

@davidwdw, I will add vgg-face-v1 and v2 normalizations tonight in the source code and pip

@serengil: thanks for answering my question about v1 and v2 as well. Hopefully my code/links were helpful.

I did a bit of additional research on ArcFace and found one more preprocessing possibility described in the paper under the section named "Input Setting":

Following [46, 23], we use five facial landmarks (eye centres, nose tip and mouth corners) [49] for similarity transformation to normalise the face images. The faces are cropped and resized to 112×112, and each pixel (ranged between [0,255]) in RGB images is normalised by subtracting 127.5 then divided by 128.

According to various GitHub posts however, it's unclear if it's necessary or if the raw/unstandardized values should be used instead, e.g., Question about preprocessing steps, Preprocessing the face image thumbnail, Dose it need to substract 127.5 and divde 128.

In light of that, I'd suggest two more variations to the normalization function: ArcFace and raw. I've added them below as a suggestion in case it's helpful:

def normalize_input(img, normalization = 'base'):

    #issue 131 declares that some normalization techniques improves the accuracy

    if normalization == 'base':
        return img
    else: #@trevorgribble recommend the following idea (further edits by @davedgd)

        img *= 255 #restore input in scale of [0, 255] because it was normalized in scale of  [0, 1] in preprocess_face

        if normalization == 'Facenet':
            mean, std = img.mean(), img.std()
            img = (img - mean) / std

        elif normalization == 'VGGFace-v1':
            # mean subtraction based on VGGFace1 training data
            img[..., 0] -= 93.5940
            img[..., 1] -= 104.7624
            img[..., 2] -= 129.1863

        elif(normalization == 'VGGFace-v2'):
            # mean subtraction based on VGGFace2 training data
            img[..., 0] -= 91.4953
            img[..., 1] -= 103.8827
            img[..., 2] -= 131.0912

        elif(normalization == 'ArcFace'):
            img -= 127.5
            img /= 128

        elif(normalization == 'tf'): # normalize between -1 and 1
            # simply / 127.5 - 1 (similar to facenet 2018 model preprocessing step as @iamrishab posted)
            img /= 127.5
            img -= 1

        elif(normalization == 'raw'):
            # return original pixel values multiplied by 255 but unscaled
            pass

    #-----------------------------

    return img

PS. In some other implementations, the scaling value will be shown as multiplication by 0.0078125, but this is simply 1/128 (i.e., the equivalent of dividing by 128), e.g., on the deepinsight repo.

serengil commented 3 years ago

@trevorgribble and @ davedgd thank you again

deepface 0.0.67 is on live and now it is simplified.

def normalize_input(img, normalization = 'base'):

    #issue 131 declares that some normalization techniques improves the accuracy

    if normalization == 'base':
        return img
    else:
        #@trevorgribble and @davedgd contributed this feature

        img *= 255 #restore input in scale of [0, 255] because it was normalized in scale of  [0, 1] in preprocess_face

        if normalization == 'raw':
            pass #return just restored pixels

        elif normalization == 'Facenet':
            mean, std = img.mean(), img.std()
            img = (img - mean) / std

        elif(normalization=="Facenet2018"):
            # simply / 127.5 - 1 (similar to facenet 2018 model preprocessing step as @iamrishab posted)
            img /= 127.5
            img -= 1

        elif normalization == 'VGGFace':
            # mean subtraction based on VGGFace1 training data
            img[..., 0] -= 93.5940
            img[..., 1] -= 104.7624
            img[..., 2] -= 129.1863

        elif(normalization == 'VGGFace2'):
            # mean subtraction based on VGGFace2 training data
            img[..., 0] -= 91.4953
            img[..., 1] -= 103.8827
            img[..., 2] -= 131.0912

        elif(normalization == 'ArcFace'):
            #Reference study: The faces are cropped and resized to 112×112,
            #and each pixel (ranged between [0, 255]) in RGB images is normalised
            #by subtracting 127.5 then divided by 128.
            img -= 127.5
            img /= 128

    #-----------------------------

    return img

As mentioned, you should look at distance instead of verified key because thresholds should be tuned for each normalization form. On the other hand, I already tuned them for base normalization. So, you can trust verified key in the response if you are using base normalization.

Akila-Ayanthi commented 3 years ago

Thank you.

nbhupendra commented 3 years ago

@trevorgribble and @ davedgd thank you again

deepface 0.0.67 is on live and now it is simplified.

def normalize_input(img, normalization = 'base'):

  #issue 131 declares that some normalization techniques improves the accuracy

  if normalization == 'base':
      return img
  else:
      #@trevorgribble and @davedgd contributed this feature

      img *= 255 #restore input in scale of [0, 255] because it was normalized in scale of  [0, 1] in preprocess_face

      if normalization == 'raw':
          pass #return just restored pixels

      elif normalization == 'Facenet':
          mean, std = img.mean(), img.std()
          img = (img - mean) / std

      elif(normalization=="Facenet2018"):
          # simply / 127.5 - 1 (similar to facenet 2018 model preprocessing step as @iamrishab posted)
          img /= 127.5
          img -= 1

      elif normalization == 'VGGFace':
          # mean subtraction based on VGGFace1 training data
          img[..., 0] -= 93.5940
          img[..., 1] -= 104.7624
          img[..., 2] -= 129.1863

      elif(normalization == 'VGGFace2'):
          # mean subtraction based on VGGFace2 training data
          img[..., 0] -= 91.4953
          img[..., 1] -= 103.8827
          img[..., 2] -= 131.0912

      elif(normalization == 'ArcFace'):
          #Reference study: The faces are cropped and resized to 112×112,
          #and each pixel (ranged between [0, 255]) in RGB images is normalised
          #by subtracting 127.5 then divided by 128.
          img -= 127.5
          img /= 128

  #-----------------------------

  return img

As mentioned, you should look at distance instead of verified key because thresholds should be tuned for each normalization form. On the other hand, I already tuned them for base normalization. So, you can trust verified key in the response if you are using base normalization.

Hi @serengil Thanks for quick update. Can you please let me know when to choose Facenet2018. in case of Facenet and facenet512?

nbhupendra commented 3 years ago

Hi @serengil Thanks for quick update. Can you please let me know when to choose Facenet2018. in case of Facenet and facenet512?

@serengil any suggestion on this?

serengil commented 3 years ago

@nbhupendra the both works. but facenet overperforms than facenet2018.

jheaff1 commented 2 months ago

I have found that enabling “Facenet” normalization, when using the “Facenet512” model and “euclidean_l2” distance metric, actually decreased the accuracy. I.e the resultant Euclidean distance of the two embeddings of Angelina Jolie was larger. Is this expected?