MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild

This repository provides an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.

Installation

Please create an environment with Python 3.10 and use requirements file to install the rest of the libraries

pip install -r reqiurements.txt

Data preparation

We provide the codes for DFEW and MAFW datasets, which you would need to download. The annotations are provided in annotations/ directory. You would need to update the paths to your own - there are preprocessing scripts in the same directory to preprocess data and rename the paths.

For MAFW dataset, you would need to extract faces from videos. Please refer to data_utils that has an example of face detection pipeline. To extract audio from the video files (in both datasets), use the following script (after modifying the paths to your own).

You will also need to download pre-trained checkpoints for vision encoder from https://github.com/FuxiVirtualHuman/MAE-Face and for audio encoder from https://github.com/facebookresearch/AudioMAE Please extract them and rename the audio checkpoint to 'audiomae_pretrained.pth'. Both checkpoints are expected to be in root folder.

Running the code

The main script in main.py. You can invoke it through running:

./train_DFEW.sh

./train_MAFW.sh

Evaluation

You can download pre-trained models on DFEW from here and on MAFW from here. Please respect the dataset license when downloading the models! Evaluation can be done as follows:

python evaluate.py --fold $FOLD --checkpoint $CHECKPOINT_PATH --img-size $IMG_SIZE --dataset [MAFW|DFEW]

References

This repository is based on DFER-CLIP https://github.com/zengqunzhao/DFER-CLIP. We also thank the authors of MAE-Face https://github.com/FuxiVirtualHuman/MAE-Face and Audiomae https://github.com/facebookresearch/AudioMAE

Citation

If you use our work, please cite as:

@InProceedings{Chumachenko_2024_CVPR,
    author    = {Chumachenko, Kateryna and Iosifidis, Alexandros and Gabbouj, Moncef},
    title     = {MMA-DFER: MultiModal Adaptation of Unimodal Models for Dynamic Facial Expression Recognition In-the-wild},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {4673-4682}
}

katerynaCh / MMA-DFER