This repostory contains code and data instructions for ConvoFusion project. In case of questions, create a github issue or email mmughal@mpi-inf.mpg.de
conda env create --name convofusion --file=environment.yml
conda create python=3.9 --name convofusion
conda activate convofusion
Install the packages in requirements.txt
and install PyTorch 2.1.2
pip install -r requirements.txt
Will add more instructions on how to download dependencies for training.
Download model folders from this link, extract zip file and place both folders in experiments/convofusion/
Setup BEAT and DnD Group Gesture Dataset. Instructions on processing data (BVH to joint conversions) will be added soon.
Please first check the parameters in configs/config_vae_beatdnd.yaml
, e.g. NAME
,DEBUG
.
Then, run the following command:
python -m train --cfg configs/config_vae_beatdnd.yaml --cfg_assets configs/assets.yaml --batch_size 128 --nodebug
Please update the parameters in configs/config_cf_beatdnd.yaml
, e.g. NAME
,DEBUG
,PRETRAINED_VAE
(change to your latest ckpt model path
in previous step)
Then, run the following command:
python -m train --cfg configs/config_cf_beatdnd.yaml --cfg_assets configs/assets.yaml --batch_size 32 --nodebug
Please first put the tained model checkpoint path to TEST.CHECKPOINT
in configs/config_cf_beatdnd.yaml
or the config of your experiment folder /path/to/trained-model/folder/config.yaml
.
Then, run the following command:
python -m test --cfg /path/to/trained-model/folder/config.yaml --cfg_assets ./configs/assets.yaml
Utilize and tweak visualize.py
script in scripts folder to visualize joint prediction. The results folder will be created after you run test.py
python visualize.py --src_dir /path/to/results/folder/
We provide scripts for quantitative evaluation in quant_eval
folder for both monadic and dyadic tasks. These scripts require the generated results folder containing predicted and GT npy motion files.
If you find our code or paper helps, please consider citing:
@InProceedings{mughal2024convofusion,
title = {ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis},
author = {Muhammad Hamza Mughal and Rishabh Dabral and Ikhsanul Habibie and Lucia Donatelli and Marc Habermann and Christian Theobalt},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
This repository is based on the awesome MLD repository. Please check out their repository for further acknowledgements of code which they use. We would also like acknowledge the authors of BEAT, Attend-and-Excite, HumanML3D, PhysCap & MoFusion since our code is also based on them.
This work was supported by the ERC Consolidator Grant 4DReply (770784). We also thank Andrea Boscolo Camiletto & Heming Zhu for help with rendering and visualizations, Christopher Hyek for designing the game for the dataset and Wolfram Wagner (MPII IST) for his help in setting the equipment up.
This code is distributed under the terms of the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This project is only for research or education purposes, and not freely available for commercial use or redistribution.
Note that our code depends on other libraries, including PyTorch3D, and uses dataset like BEAT which each have their own respective licenses that must also be followed.