korrawe / guided-motion-diffusion

142 stars 9 forks source link

GMD: Guided Motion Diffusion for Controllable Human Motion Synthesis


The official PyTorch implementation of the paper "GMD: Controllable Human Motion Synthesis via Guided Diffusion Models".

For more details, visit our project page.



📢 20/Dec/23 - We release DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors, a follow-up work that looks at how to effectively use diffusion model and guidance to tackle many motion tasks. \ 28/July/23 - First release.


If you find this code useful in your research, please cite:

  title     = {Guided Motion Diffusion for Controllable Human Motion Synthesis},
  author    = {Karunratanakul, Korrawe and Preechakul, Konpat and Suwajanakorn, Supasorn and Tang, Siyu},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages     = {2151--2162},
  year      = {2023}

Getting started

This code was tested on Ubuntu 20.04 LTS and requires:

1. Setup environment

Install ffmpeg (if not already installed):

sudo apt update
sudo apt install ffmpeg

For windows use this instead.

2. Install dependencies

GMD shares a large part of its base dependencies with the MDM. However, you might find it easier to install our dependencies from scratch due to some key version differences.

Setup conda env:

conda env create -f environment_gmd.yml
conda activate gmd
conda remove --force ffmpeg
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git

Download dependencies:

Text to Motion ```bash bash prepare/download_smpl_files.sh bash prepare/download_glove.sh bash prepare/download_t2m_evaluators.sh ```
Unconstrained ```bash bash prepare/download_smpl_files.sh bash prepare/download_recognition_unconstrained_models.sh ```

2. Get data

There are two paths to get the data:

(a) Generation only wtih pretrained text-to-motion model without training or evaluating

(b) Get full data to train and evaluate the model.

a. Generation only (text only)

HumanML3D - Clone HumanML3D, then copy the data dir to our repository:

cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D guided-motion-diffusion/dataset/HumanML3D
cd guided-motion-diffusion
cp -a dataset/HumanML3D_abs/. dataset/HumanML3D/

b. Full data (text + motion capture)

[Important !] Because we change the representation of the root joint from relative to absolute, you need to replace the original files and run our version of motion_representation.ipynb and cal_mean_variance.ipynb provided in ./HumanML3D_abs/ instead to get the absolute-root data.

HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:

Then copy the data to our repository

cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D

3. Download the pretrained models

Download both models, then unzip and place them in ./save/.

Both models are trained on the HumanML3D dataset.

trajectory model

motion model

Motion Synthesis

Text to Motion - Without spatial conditioning This part is a standard text-to-motion generation. ### Generate from test set prompts Note: We change the behavior of the `--num_repetitions` flag from the original MDM repo to facilitate the two-staged pipeline and logging. We only support `--num_repetitions 1` at this moment. ```shell python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --num_samples 10 ``` ### Generate from your text file ```shell python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --input_text ./assets/example_text_prompts.txt ``` ### Generate from a single prompt - no spatial guidance ```shell python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is picking up something on the floor" ``` ![example](assets/example_text_only.gif)
Text to Motion - With keyframe locations conditioning ### Generate from a single prompt - condition on keyframe locations The predefined pattern can be found in `get_kframes()` in `sample/keyframe_pattern.py`. You can add more patterns there using the same format `[(frame_num_1, (x_1, z_1)), (frame_num_2, (x_2, z_2)), ...]` where `x` and `z` are the location of the root joint on the plane in the world coordinate system. ```shell python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is walking while raising both hands" --guidance_mode kps ``` ![example](assets/example_kps.gif) (In development) Using the `--interactive` flag will start an interactive window that allows you to choose the keyframes yourself. The interactive pattern will override the predefined pattern.
Text to Motion - With keyframe locations conditioning and obstacle avoidance Similarly, the pattern is defined in `get_obstacles()` in `sample/keyframe_pattern.py`. You can add more patterns using the format `((x, z), radius)` currently we only support circle obstacle due to the ease of defining SDF, but you can add any shape with valid SDF. ```shell python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is walking while raising both hands" --guidance_mode sdf --seed 11 ``` ![example](assets/example_sdf.gif)
Text to Motion - With trajectory conditioning ### Generate from a single prompt - condition on a trajectory The trajectory-conditioned generation is a special case of keyframe-conditioned generation, where all the frames are keyframes. The sample trajectory we used can be found in `./save/template_joints.npy`. You can also use your own trajectory by providing the list of `ground_positions`. ```shell python -m sample.generate --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --text_prompt "a person is walking while raising both hands" --guidance_mode trajectory ``` ![example](assets/example_trajectory.gif) (In development) Using the `--interactive` flag will start an interactive window that allows you to draw a trajectory that will override the predefined pattern.

You may also define:

Running those will get you:

You can stop here, or render the SMPL mesh using the following script.

Render SMPL mesh

To create SMPL mesh per frame run:

python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file

This script outputs:


Notes for 3d makers:

Training GMD

GMD is trained on the HumanML3D dataset.

Trajectory Model

python -m train.train_trajectory

Motion Model

python -m train.train_gmd

Essentially, the same command is used for both the trajectory model and the motion model. You can select which model to train by changing the train_args. The training options can be found in ./configs/card.py.


All evaluation are done on the HumanML3D dataset.

Text to Motion

python -m eval.eval_humanml --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt

Text to Motion - With trajectory conditioning

For each prompt, we use the ground truth trajectory as conditions.

python -m eval.eval_humanml --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt --full_traj_inpaint

Text to Motion - With keyframe locations conditioning

For each prompt, 5 keyframes are sampled from the ground truth motion. The ground locations of the root joint in those frames are used as conditions.

python -m eval.eval_humanml_condition --model_path ./save/unet_adazero_xl_x0_abs_proj10_fp16_clipwd_224/model000500000.pt


We would like to thank the following contributors for the great foundation that we build upon:

MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi.


This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.