The official PyTorch implementation of the paper "GMD: Controllable Human Motion Synthesis via Guided Diffusion Models".
For more details, visit our project page.
📢 20/Dec/23 - We release DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors, a follow-up work that looks at how to effectively use diffusion model and guidance to tackle many motion tasks. \ 28/July/23 - First release.
If you find this code useful in your research, please cite:
@inproceedings{karunratanakul2023gmd,
title = {Guided Motion Diffusion for Controllable Human Motion Synthesis},
author = {Karunratanakul, Korrawe and Preechakul, Konpat and Suwajanakorn, Supasorn and Tang, Siyu},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages = {2151--2162},
year = {2023}
}
This code was tested on Ubuntu 20.04 LTS
and requires:
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
For windows use this instead.
GMD shares a large part of its base dependencies with the MDM. However, you might find it easier to install our dependencies from scratch due to some key version differences.
Setup conda env:
conda env create -f environment_gmd.yml
conda activate gmd
conda remove --force ffmpeg
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
Download dependencies:
There are two paths to get the data:
(a) Generation only wtih pretrained text-to-motion model without training or evaluating
(b) Get full data to train and evaluate the model.
HumanML3D - Clone HumanML3D, then copy the data dir to our repository:
cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D guided-motion-diffusion/dataset/HumanML3D
cd guided-motion-diffusion
cp -a dataset/HumanML3D_abs/. dataset/HumanML3D/
[Important !]
Because we change the representation of the root joint from relative to absolute, you need to replace the original files and run our version of motion_representation.ipynb
and cal_mean_variance.ipynb
provided in ./HumanML3D_abs/
instead to get the absolute-root data.
HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:
Then copy the data to our repository
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
Download both models, then unzip and place them in ./save/
.
Both models are trained on the HumanML3D dataset.
You may also define:
--device
id.--seed
to sample different prompts.--motion_length
(text-to-motion only) in seconds (maximum is 9.8[sec]).--progress
to save the denosing progress.Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion.trajec_##_####
- a plot of the trajectory at each denoising step of the trajectory model. The final trajectory is then used to generate the motion.motion_trajec_##_####
- a plot of the trajectory of the generated motion at each denoising step of the motion model.You can stop here, or render the SMPL mesh using the following script.
To create SMPL mesh per frame run:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
sample##_rep##_smpl_params.npy
- SMPL parameters (thetas, root translations, vertices and faces)sample##_rep##_obj
- Mesh per frame in .obj
format.Notes:
.obj
can be integrated into Blender/Maya/3DS-MAX and rendered using them.--device
flag)..mp4
path before running the script.Notes for 3d makers:
sample##_rep##_smpl_params.npy
(we always use beta=0 and the gender-neutral model).sample##_rep##_smpl_params.npy
file for your convenience.GMD is trained on the HumanML3D dataset.
python -m train.train_trajectory
python -m train.train_gmd
Essentially, the same command is used for both the trajectory model and the motion model. You can select which model to train by changing the train_args
. The training options can be found in ./configs/card.py
.
--device
to define GPU id.--train_platform_type {ClearmlPlatform, TensorboardPlatform}
to track results with either ClearML or Tensorboard.All evaluation are done on the HumanML3D dataset.
python -m eval.eval_humanml --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt
For each prompt, we use the ground truth trajectory as conditions.
python -m eval.eval_humanml --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --full_traj_inpaint
For each prompt, 5 keyframes are sampled from the ground truth motion. The ground locations of the root joint in those frames are used as conditions.
python -m eval.eval_humanml_condition --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt
We would like to thank the following contributors for the great foundation that we build upon:
MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.