wuhaozhe / audio2face_mm2023

38 stars 3 forks source link

audio2face MM 2023

This is the official code for MM2023 paper: Speech-Driven 3D Face Animation with Composite and Regional Facial Movements.

Given a template 3D face, driven 3D face sequence, and driven speech audio, we synthesize 3D face sequence synchronized with the speech audio, and modulated by speech-independent factors of the driven 3D face sequence.


We train and test based on Python3.8 and Pytorch. To install the dependencies run:

pip install -r requirements.txt

Additionallly, you need to install pytorch3d following these instructions.

Dataset preparation

We provide the processed voca data in voca.zip. Please unzip the zip file in the root folder. The zip file can be downloaded in the following link:


Extraction code: fgqi


bash train_meshtalk.sh

We train the backbone with a two-stage manner. In the first step, we freeze the HuBERT model and train the ResNet1D. In the second step, we simultaneously fine-tune all of the models.

*_weight_mask.npy is 0/1 weight mask of facial regions

face_axis_mean.npy(size 3) and face_axis_std.npy(size 1) are overall mean/std of the whole dataset, which are used for normalization.

### Inference

bash test_meshtalk.sh

The pretrained VOCASET, meshtalk dataset, and BIWI datset models can be found in this link:


Extraction code: tmi7

### Citation

@inproceedings{wu2023audio2face, title={Speech-Driven 3D Face Animation with Composite and Regional Facial Movements}, author={Wu, Haozhe and Zhou, Songtao, and Jia, Jia and Xing, Junliang and Wen, Qi and Wen, Xiang}, booktitle={Proceedings of the 31st ACM International Conference on Multimedia}, year={2023} }