zhanglonghao1992 / One-Shot_Free-View_Neural_Talking_Head_Synthesis

Pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"
Other
797 stars 143 forks source link

One-Shot Free-View Neural Talking Head Synthesis

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing".

Python 3.6 and Pytorch 1.7 are used.

Updates:


2021.11.05 :

2021.11.17 :

Driving | Beta Version | FOMM | New Version:

https://user-images.githubusercontent.com/17874285/142828000-db7b324e-c2fd-4fdc-a272-04fb8adbc88a.mp4


Driving | FOMM | Ours:
show

Free-View:
show

Train:

python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7

Demo:

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame

free-view (e.g. yaw=20, pitch=roll=0):

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --roll 0

Note: run crop-video.py --inp driving_video.mp4 first to get the cropping suggestion and crop the raw video.

Pretrained Model:

Model Train Set Baidu Netdisk Media Fire
Vox-256-Beta VoxCeleb-v1 Baidu (PW: c0tc) MF
Vox-256-New VoxCeleb-v1 - MF
Vox-512 VoxCeleb-v2 soon soon

Note:

  1. For now, the Beta Version is not well tuned.
  2. For free-view synthesis, it is recommended that Yaw, Pitch and Roll are within ±45°, ±20° and ±20° respectively.
  3. Face Restoration algorithms (GPEN) can be used for post-processing to significantly improve the resolution. show

Acknowlegement:

Thanks to NV, AliaksandrSiarohin and DeepHeadPose.