serengil / deepface

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
https://www.youtube.com/watch?v=WnUVYQP4h44&list=PLsS_1RYmYQQFdWqxQggXHynP1rqaYXv_E&index=1
MIT License
11.83k stars 2.02k forks source link

[Question] FastMtCnn detector : image_size #1113

Closed AndreaLanfranchi closed 6 months ago

AndreaLanfranchi commented 6 months ago

Consider this code happening in the initialization of the model:

        face_detector = fast_mtcnn(
            image_size=160,
            thresholds=[0.6, 0.7, 0.7],  # MTCNN thresholds
            post_process=True,
            device="cpu",
            select_largest=False,  # return result in descending order
        )

The argument image_size is valued to a constant value == 160. However the method detect_faces apparently does not take into account this value hence the question : do we have to resize the image to the image_size value before processing or is it automagically scaled by this ?

        detections = self.model.detect(
            img_rgb, landmarks=True
        )  # returns boundingbox, prob, landmark

Thank you

serengil commented 6 months ago

Good question. This detector was added by a PR. Need to make some investigation. Then, will let yo know here.

AndreaLanfranchi commented 6 months ago

Possibly I got the answer myself. According to MTCNN documentation:

image_size {int} -- Output image size in pixels. The image will be square. (default: {160})

This value is used only when MTCNN is invoked for the forward() method (which does detection and extraction) While instead the method detect() is used the value of image_size is totally irrelevant.

Besides I would underline that this comment:

            select_largest=False,  # return result in descending order

is misleading.

In fact, also according to documentation

        select_largest {bool} -- If True, if multiple faces are detected, the largest is returned.
            If False, the face with the highest detection probability is returned.
            (default: {True})

This forces only the largest image to be returned unless also the argument keep_all is valued to True. As a result the enumerator in results is pleonastic as we expect (under this configuration) only 1 element or none

Ref

        select_largest {bool} -- If True, if multiple faces are detected, the largest is returned.
            If False, the face with the highest detection probability is returned.
            (default: {True})
        selection_method {string} -- Which heuristic to use for selection. Default None. If
            specified, will override select_largest:
                    "probability": highest probability selected
                    "largest": largest box selected
                    "largest_over_threshold": largest box over a certain probability selected
                    "center_weighted_size": box size minus weighted squared offset from image center
                (default: {None})
        keep_all {bool} -- If True, all detected faces are returned, in the order dictated by the
            select_largest parameter. If a save_path is specified, the first face is saved to that
            path and the remaining faces are saved to <save_path>1, <save_path>2 etc.
            (default: {False})
AndreaLanfranchi commented 6 months ago

As a result I would remove the explicit assignements into the creation of the instance for MTCNN as they're exactly the default values assumed by the model.

serengil commented 6 months ago

Very helpful, ty.

Will sort source code tomorrow (most probably)

AndreaLanfranchi commented 6 months ago

Also consider that, to be consistent with other models (which return all the faces detected on an image) this should also enforce the argument keep_all to True: otherwise only one face is returned. As a result the enumeration has to be kept.

serengil commented 6 months ago

Yeah that is a bug, detectors should return all faces not just one.

I do not think keep all arg is necessary because return type is list and many faces should be returned.

serengil commented 6 months ago

I just tested this with the following snippet. Seems it is working fine as is.

from deepface import DeepFace
import matplotlib.pyplot as plt
import cv2

img_path = "dataset/couple.jpg"
img = cv2.imread(img_path)

objs = DeepFace.extract_faces(img_path=img_path, detector_backend="fastmtcnn")

for obj in objs:
    # plt.imshow(obj["face"])
    x = obj["facial_area"]["x"]
    y = obj["facial_area"]["y"]
    w = obj["facial_area"]["w"]
    h = obj["facial_area"]["h"]
    cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 255), 1)

fig = plt.figure(figsize=(10, 10))
plt.imshow(img[:,:,::-1])
plt.show()

So, setting arg select_largest to False is not causing to return just one face. Source documentation may not be upt-to-date or correct.

I plan to close this because this is not a bug if you will not raise anything else?

Screen Shot 2024-03-16 at 11 59 54
AndreaLanfranchi commented 6 months ago

Ok then. Better this way. Apparently keep_all works only for extraction exactly like for image_size

serengil commented 6 months ago

May I close this?

AndreaLanfranchi commented 6 months ago

Yes solves my doubt. Nevertheless I would change the instantiation line to

self._detector = fast_mtcnn(device="cpu")

where device is the only non-default value set and removes the ambiguity for image_size which is actually not relevant in this scope.

serengil commented 6 months ago

Thank you for your contribution again

AndreaLanfranchi commented 6 months ago

One more bit ... I would add a more safety validation here. Actually reading the detect implementation I see that the img argument can be a 4 dimension image (hence multiple images) and if it's that the case then the tuple returned becomes an array of arrays hence invalidating this

                for current_detection in zip(*detections):
                    x, y, w, h = self._xyxy_to_xywh(current_detection[0])
                    confidence = current_detection[1]
                    left_eye = current_detection[2][0]
                    right_eye = current_detection[2][1]

I believe that each detector is supposed to work on a single discrete images only.

serengil commented 6 months ago

Handled with PR - https://github.com/serengil/deepface/pull/1115