OmniControl: Control Any Joint at Any Time for Human Motion Generation
Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang
@inproceedings{
xie2024omnicontrol,
title={OmniControl: Control Any Joint at Any Time for Human Motion Generation},
author={Yiming Xie and Varun Jampani and Lei Zhong and Deqing Sun and Huaizu Jiang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=gd0lAEtWso}
}
📢 10/Dec/23 - First release
This code requires:
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
For windows use this instead.
Setup conda env:
conda env create -f environment.yml
conda activate omnicontrol
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
Download dependencies:
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
KIT - Download from HumanML3D (no processing needed this time) and the place result in ./dataset/KIT-ML
Download the model(s) you wish to use, then unzip and place them in ./save/
.
HumanML3D
cd save
gdown --id 1oTkBtArc3xjqkYD6Id7LksrTOn3e1Zud
unzip omnicontrol_ckpt.zip -d .
cd ..
Check the manually defined spatial control signals in text_control_example. You can define your own inputs following this file.
python -m sample.generate --model_path ./save/omnicontrol_ckpt/model_humanml3d.pt --num_repetitions 1
We randomly sample spatial control signals from the ground-truth motions of HumanML3D dataset.
python -m sample.generate --model_path ./save/omnicontrol_ckpt/model_humanml3d.pt --num_repetitions 1 --text_prompt ''
You may also define:
--device
id.--seed
to sample different prompts.--motion_length
(text-to-motion only) in seconds (maximum is 9.8[sec]).Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion.It will look something like this:
You can stop here, or render the SMPL mesh using the following script.
This part is directly borrowed from MDM.
To create SMPL mesh per frame run:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
sample##_rep##_smpl_params.npy
- SMPL parameters (thetas, root translations, vertices and faces)sample##_rep##_obj
- Mesh per frame in .obj
format.Notes:
.obj
can be integrated into Blender/Maya/3DS-MAX and rendered using them.--device
flag)..mp4
path before running the script.Notes for 3d makers:
sample##_rep##_smpl_params.npy
(we always use beta=0 and the gender-neutral model).sample##_rep##_smpl_params.npy
file for your convenience.HumanML3D
Download the pretrained MDM model. The model is from MDM.
Then place it in ./save/
.
Or you can download the pretrained model via
cd save
gdown --id 1XS_kp1JszAxgZBq9SL8Y5JscVVqJ2c7H
cd ..
You can train your own model via
python -m train.train_mdm --save_dir save/my_omnicontrol --dataset humanml --num_steps 400000 --batch_size 64 --resume_checkpoint ./save/model000475000.pt --lr 1e-5
HumanML3D
./eval_omnicontrol_all.sh ./save/omnicontrol_ckpt/model_humanml3d.pt
Or you can evaluate each setting separately, e.g., root joint (0) with dense spatial control signal (100).
It takes about 1.5 hours.
./eval_omnicontrol.sh ./save/omnicontrol_ckpt/model_humanml3d.pt 0 100
Spatial Guidance. (./diffusion/gaussian_diffusion.py#L450)
Realism Guidance. (./model/cmdm.py#L158)
Our code is based on MDM.
The motion visualization is based on MLD and TMOS.
We also thank the following works:
guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.