Upper body image synthesis from skeleton(Keypoints). Pose2Img module in the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates". [arxiv / github]
This is a modified implementation of Synthesizing Images of Humans in Unseen Poses.
To install dependencies, run
pip install -r requirements.txt
To run this module, you need two NVIDIA gpus with at least 11 GB respectively. Our code is tested on Ubuntu 18.04LTS with Python3.6.
$ROOT/configs/yaml/Oliver.yaml
:$ROOT/data/Oliver
$ROOT/ckpt/Oliver/ckpt_final.pth
python main.py \
--name Oliver \
--config_path configs/yaml/Oliver.yaml \
--batch_size 1 \
tensorboard --logdir ./log --port={$Port} --bind_all
Generate a realistic video for Oliver from {keypoints}.npz.
python inference.py \
--cfg_path cfg/yaml/Oliver.yaml \
--name demo \
--npz_path target_pose/Oliver/varying_tmplt.npz \
--wav_path target_pose/Oliver/varying_tmplt.mp4
jpg
files which correspond to the npz.For your own dataset, you need to modify custom config.yaml.
Prepare the keypoints using OpenPose.
The raw keypoints for each frame is of shape (3, 137)
which is composed of [pose(3,25), face(3,70), left_hand(3,21),right_hand(3,21)]
The definition is as follows:
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{qian2021speech,
title={Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates},
author={Qian, Shenhan and Tu, Zhi and Zhi, YiHao and Liu, Wen and Gao, Shenghua},
journal={International Conference on Computer Vision (ICCV)},
year={2021}
}