Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
Shuai Tan, et al.
In AAAI, 2024.
Our approach takes an identity image and an audio clip as inputs and generates a talking head with emotion style and art style, which are controlled respectively by an emotion source text and an art source picture. The pipeline of our $\text{Style}^2\text{Talker}$ is as follows:
We train and test based on Python 3.7 and Pytorch. To install the dependencies run:
conda create -n style2talker python=3.7
conda activate style2talker
pip install -r requirements.txt
python inference.py --img_path path/to/image --wav_path path/to/audio --source_3DMM path/to/source_3DMM --style_e_source "a textual description for emotion style" --art_style_id num/for/art_style --save_path path/to/save
The result will be stored in save_path.
python data_preprocess/crop_video.py
python data_preprocess/extract_3DMM.py
python data_preprocess/extract_lmdk.py
python data_preprocess/get_mel.py
python data_preprocess/prepare_lmdb.py
# Train Style-A:
python -m torch.distributed.launch --nproc_per_node=4 --master_port 12344 train_style_a.py
Some code are borrowed from following projects:
Thanks for their contributions!
If you find this codebase useful for your research, please use the following entry.
@inproceedings{tan2024style2talker,
title={Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style},
author={Tan, Shuai and Ji, Bin and Pan, Ye},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={5},
pages={5079--5087},
year={2024}
}