Issue with mediapipe when loading frames from a video

kcmcveigh commented 1 week ago

Before You Report a Bug, Please Confirm You Have Done The Following...

[X] I have updated to the latest version of the packages.
[X] I have searched for both existing issues and closed issues and found none that matched my issue.

DeepFace's version

0.0.92

Python version

3.10.14

Operating System

No response

Dependencies

absl-py==2.1.0 aiohttp==3.9.5 aiosignal==1.3.1 alembic==1.13.1 antlr4-python3-runtime==4.9.3 anyio==4.3.0 appnope==0.1.4 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asteroid-filterbanks==0.4.0 asttokens==2.4.1 astunparse==1.6.3 async-lru==2.0.4 async-timeout==4.0.3 attrs==23.2.0 audioread==3.0.1 av==11.0.0 Babel==2.15.0 beautifulsoup4==4.12.3 bleach==6.1.0 blinker==1.8.2 boto3==1.26.106 botocore==1.29.165 cachetools==5.3.3 certifi==2024.2.2 cffi @ file:///private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_7a9c7wyorr/croot/cffi_1714483157752/work charset-normalizer==3.3.2 click==8.1.7 cmake==3.29.3 coloredlogs==15.0.1 colorlog==6.8.2 comm==0.2.2 contourpy==1.2.1 ctranslate2==4.3.0 cycler==0.12.1 Cython==3.0.10 debugpy==1.8.1 decorator==5.1.1 deepface==0.0.92 defusedxml==0.7.1 disvoice==0.1.8 dlib @ file:///Users/runner/miniforge3/conda-bld/dlib-split_1716026372338/work docopt==0.6.2 einops==0.8.0 exceptiongroup==1.2.1 executing==2.0.1 faster-whisper==1.0.0 fastjsonschema==2.19.1 filelock==3.14.0 fire==0.6.0 Flask==3.0.3 flatbuffers==24.3.25 fonttools==4.51.0 forest @ git+https://github.com/onnela-lab/forest@5b9ba664cc09ac49f58e42cce3b271aa03a41967 fqdn==1.5.1 frozenlist==1.4.1 fsspec==2024.5.0 gast==0.5.4 gdown==5.2.0 google-auth==2.29.0 google-auth-oauthlib==1.2.0 google-pasta==0.2.0 grpcio==1.64.0 gunicorn==22.0.0 h11==0.14.0 h3==3.7.7 h5py==3.11.0 holidays==0.49 httpcore==1.0.5 httpx==0.27.0 huggingface-hub==0.19.3 humanfriendly==10.0 HyperPyYAML==1.2.2 idna==3.7 ipykernel==6.29.4 ipython==8.24.0 ipywidgets==8.1.2 isoduration==20.11.0 itsdangerous==2.2.0 jedi==0.19.1 Jinja2==3.1.4 jmespath==1.0.1 joblib==1.4.2 json5==0.9.25 jsonpointer==2.4 jsonschema==4.22.0 jsonschema-specifications==2023.12.1 julius==0.2.7 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.10.0 jupyter-lsp==2.2.5 jupyter_client==8.6.1 jupyter_core==5.7.2 jupyter_server==2.14.0 jupyter_server_terminals==0.5.3 jupyterlab==4.2.0 jupyterlab_pygments==0.3.0 jupyterlab_server==2.27.1 jupyterlab_widgets==3.0.10 kaldi-io==0.9.8 keras==2.15.0 kiwisolver==1.4.5 lazy_loader==0.4 lexicalrichness==0.5.0 libclang==18.1.1 librosa==0.10.1 lightning==2.2.4 lightning-utilities==0.11.2 llvmlite==0.42.0 Mako==1.3.5 Markdown==3.6 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.9.0 matplotlib-inline==0.1.7 mdurl==0.1.2 mediapipe==0.10.1 mistune==3.0.2 ml-dtypes==0.2.0 mpmath==1.3.0 msgpack==1.0.8 mtcnn==0.1.1 multidict==6.0.5 nbclient==0.10.0 nbconvert==7.16.4 nbformat==5.10.4 nest-asyncio==1.6.0 networkx==3.3 nltk==3.8.1 notebook==7.2.0 notebook_shim==0.2.4 numba==0.59.1 numpy @ file:///private/var/folders/k1/30mswbxs7r1g6zwn8y4fyt500000gp/T/abs_a51i_mbs7m/croot/numpy_and_numpy_base_1708638620867/work/dist/numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl#sha256=c4b11b3c4d4fdb810039503fe01f311ade06cd1d675fcd6d208800a393f19b69 oauthlib==3.2.2 omegaconf==2.3.0 onnxruntime==1.18.0 opencv-contrib-python==4.9.0.80 opencv-python==4.6.0.66 openrouteservice==2.3.3 openwillis==2.1.6 opt-einsum==3.3.0 optuna==3.6.1 overrides==7.7.0 packaging==24.0 pandas==1.4.2 pandocfilters==1.5.1 parso==0.8.4 pexpect==4.9.0 phonet==0.3.7 pillow==10.3.0 platformdirs==4.2.2 pooch==1.8.1 praat-parselmouth==0.4.3 primePy==1.3 prometheus_client==0.20.0 prompt-toolkit==3.0.43 protobuf==3.20.3 protobuf3-to-dict==0.1.5 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pyannote.audio==3.0.0 pyannote.core==5.0.0 pyannote.database==5.1.0 pyannote.metrics==3.2.1 pyannote.pipeline==3.0.1 pyasn1==0.6.0 pyasn1_modules==0.4.0 pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1711811537435/work pydub==0.25.1 Pygments==2.18.0 pyparsing==3.1.2 pyproj==3.6.1 PySocks==1.7.1 pysptk==0.2.2 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 python_speech_features==0.6 pytorch-lightning==2.2.4 pytorch-metric-learning==2.5.0 pytz==2024.1 PyYAML==6.0.1 pyzmq==26.0.3 qtconsole==5.5.2 QtPy==2.4.1 ratelimit==2.2.1 referencing==0.35.1 regex==2024.5.15 requests==2.32.1 requests-oauthlib==2.0.0 retina-face==0.0.17 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rich==13.7.1 rpds-py==0.18.1 rsa==4.9 ruamel.yaml==0.18.6 ruamel.yaml.clib==0.2.8 s3transfer==0.6.2 safetensors==0.4.3 scikit-learn==1.4.2 scipy==1.10.1 semver==3.0.2 Send2Trash==1.8.3 sentence-transformers==2.2.2 sentencepiece==0.2.0 shapely==2.0.4 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 sortedcontainers==2.4.0 sounddevice==0.4.6 soundfile @ file:///home/conda/feedstock_root/build_artifacts/pysoundfile_1676571469739/work soupsieve==2.5 soxr==0.3.7 speechbrain==1.0.0 SQLAlchemy==2.0.30 srt==3.5.3 ssqueezepy==0.6.5 stack-data==0.6.3 sympy==1.12 tabulate==0.9.0 tensorboard==2.15.2 tensorboard-data-server==0.7.2 tensorboardX==2.6.2.2 tensorflow==2.15.0 tensorflow-estimator==2.15.0 tensorflow-io-gcs-filesystem==0.37.0 tensorflow-macos==2.15.0 termcolor==2.4.0 terminado==0.18.1 textblob==0.18.0.post0 threadpoolctl==3.5.0 timezonefinder==6.5.0 tinycss2==1.3.0 tokenizers==0.15.2 tomli==2.0.1 torch==2.0.0 torch-audiomentations==0.11.1 torch-pitch-shift==1.2.4 torchaudio==2.0.1 torchmetrics==1.4.0.post0 torchvision==0.15.1 tornado==6.4 tqdm==4.66.4 traitlets==5.14.3 transformers==4.36.0 typer==0.12.3 types-python-dateutil==2.9.0.20240316 typing_extensions==4.11.0 tzdata==2024.1 uri-template==1.3.0 urllib3==1.26.18 vaderSentiment==3.3.2 vosk==0.3.44 wcwidth==0.2.13 webcolors==1.13 webencodings==0.5.1 websocket-client==1.8.0 websockets==12.0 Werkzeug==3.0.3 whisperx @ git+https://github.com/m-bain/whisperx.git@f2da2f858e99e4211fe4f64b5f2938b007827e17 widgetsnbextension==4.0.10 wrapt==1.14.1 yarl==1.9.4

Reproducible example

import cv2
from deepface import DeepFace

# Path to the video file
video_path = '../example_data/PANSS_DEMO_12min.mov'
# Process the video
cap = cv2.VideoCapture(video_path)

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')

frame_idx = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Convert the BGR image to RGB
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    break

 # test iterations
bgr_wo_align = DeepFace.extract_faces(frame, detector_backend='mediapipe', enforce_detection=False, align = False)
bgr_aligned = DeepFace.extract_faces(frame, detector_backend='mediapipe', enforce_detection=False, align = True)

rgb_wo_align = DeepFace.extract_faces(rgb_frame, detector_backend='mediapipe', enforce_detection=False, align = False)
rgb_aligned = DeepFace.extract_faces(rgb_frame, detector_backend='mediapipe', enforce_detection=False, align = True)

print('bgr frame wo alignment does not detect',bgr_wo_align[0]['confidence'])
print('bgr frame w alignment does not detect face',bgr_wo_align[0]['confidence'])
print('rgb frame wo alignment detects face',rgb_wo_align[0]['confidence'])
print('rgb frame w alignment does not detect face',rgb_aligned[0]['confidence'])

Relevant Log Output

bgr frame wo alignment does not detect 0 bgr frame w alignment does not detect face 0 rgb frame wo alignment detects face 0.74 rgb frame w alignment does not detect face 0

Expected Result

I would expect the extract_faces to work with a bgr frame as suggested in the doc string (although mediapipe seems to expect a rgb frame). I also wouldn't expect alignment to hurt performance to this extent?

What happened instead?

extract_faces only seems to work when the frame is rgb and alignment is turned off

Additional Info

No response

serengil commented 1 week ago

when you feed a numpy array - does not matter rgb or bgr, we are passing it to backend detector as

https://github.com/serengil/deepface/blob/master/deepface/commons/image_utils.py#L76

so, if you have any trouble, then you should raise this issue in the mediapipe's repo instead of deepface.

kcmcveigh commented 1 week ago

I'm still a bit confused as media pipe seems to expect rgb frames:

https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker/python#video_2

While the last line of the linked utility function returns bgr frames?

return img_obj_bgr, img

Shouldn't we pass rgb frames to mediapipe? Thanks for the help!

serengil commented 1 week ago

you are not jumping that line if it is numpy

kcmcveigh commented 1 week ago

Ahh I see now thank you! So is it fair to say the extract_faces function assumes you pass numpy arrays in the format (RGB, BGR) the detector expects, and that this should be done before passing a numpy array to the extract_faces function?

serengil commented 1 week ago

if numpy array is passed, yes.

i cannot understand the given image is bgr or rgb.

kcmcveigh commented 1 week ago

So in my example I load frames with cv2 from a video. By default they're loaded as bgr. In the sample code when these frames are passed to the extract_faces function it fails because mediapipe expects rgb. When I convert the frames to bgr in the sample code above extract_faces works as this is what mediapipe expects. Interestingly if align is true the function fails again.

If this is the intended functionality then I think this docstring in DeepFace.py on the extract_faces function should be updated?

Args: img_path (str or np.ndarray): Path to the first image. Accepts exact image path as a string, numpy array (BGR), or base64 encoded images.

serengil / deepface