megvii-research / megactor

Apache License 2.0
580 stars 78 forks source link

MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Shurong Yang*, Huadong Li*, Juhao Wu*, Minhao Jing*†, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan

MEGVII Technology

*Equal contribution Lead this project Corresponding author



News & TODO List

MegActor Features:

Usability: animates a portrait with video while ensuring consistent motion.

Reproducibility: fully open-source and trained on publicly available datasets.

Efficiency: ⚡200 V100 hours of training to achieve pleasant motions on portraits.

Overview

Model

MegActor is an intermediate-representation-free portrait animator that uses the original video, rather than intermediate features, as the driving factor to generate realistic and vivid talking head videos. Specifically, we utilize two UNets: one extracts the identity and background features from the source image, while the other accurately generates and integrates motion features directly derived from the original videos. MegActor can be trained on low-quality, publicly available datasets and excels in facial expressiveness, pose diversity, subtle controllability, and visual quality.

Pre-generated results

https://github.com/megvii-research/MegFaceAnimate/assets/29685592/1b9dc77c-50da-48bd-bb16-8b2dd56d703f

https://github.com/megvii-research/MegFaceAnimate/assets/29685592/ce4e5c19-cdc7-435e-83f3-8bce39f0c04e

https://github.com/megvii-research/MegFaceAnimate/assets/29685592/c7d71435-c98a-42b6-9f59-c72cb49851a1

Preparation

Training

We currently support two-stage training on single node machines.

Stage1(Image training):

bash train.sh train.py ./configs/train/train_stage1.yaml {number of gpus on this node}

Stage2(Video training):

bash train.sh train.py ./configs/train/train_stage2.yaml {number of gpus on this node}

Inference

Currently only single-GPU inference is supported. We highly recommend that you use --contour-preserve arg the better preserve the shape of the source face.

CUDA_VISIBLE_DEVICES=0 python eval.py --config configs/inference/inference.yaml --source {source image path} --driver {driving video path} --contour-preserve

Demo

For gradio interface, please run

python demo/run_gradio.py

BibTeX

@misc{yang2024megactor,
      title={MegActor: Harness the Power of Raw Video for Vivid Portrait Animation}, 
      author={Shurong Yang and Huadong Li and Juhao Wu and Minhao Jing and Linze Li and Renhe Ji and Jiajun Liang and Haoqiang Fan},
      year={2024},
      eprint={2405.20851},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Many thanks to the authors of mmengine, MagicAnimate, Controlnet_aux, and Detectron2.

Contact

If you have any questions, feel free to open an issue or contact us at yangshurong6894@gmail.com, lihuadong@megvii.com or wujuhao@megvii.com.

Star History

Star History Chart