This repository is the official implementation of InterControl : Generate Human Motion Interactions by Controlling Every Joint
Zhenzhi Wang $^1$, Jingbo Wang $^2$, Yixuan Li $^1$, Dahua Lin $^{1,2}$, Bo Dai $^2$.
$^1$ CUHK, $^2$ Shanghai AI Lab.
Three people are holding hands together. | Two people are fighting with another person, leading to a 2v1 fighting game. | Character animation version of 2v1 fighting in a physics simulator. |
![]() |
![]() |
![]() |
A person wins the fighting game and the referee holding his hands up to celebrate his success. | Two people are fighting with each other (1v1 fighting game). | Character animation version of 1v1 fighting in a physics simulator. |
![]() |
![]() |
![]() |
Two people are dancing together (sample 1). | Two people are dancing together (sample 2). | Two people are dancing together (sample 3). |
![]() |
![]() |
![]() |
Text-conditioned human motion generation model has achieved great progress by introducing diffusion models and corresponding control signals. However, the interaction between humans are still under explored. To model interactions of arbitrary number of humans, we define interactions as human joint pairs that are either in contact or separated, and leverage Large Language Model (LLM) Planner to translate interaction descriptions into contact plans. Based on the contact plans, interaction generation could be achieved by spatially controllable motion generation methods by taking joint contacts as spatial conditions. We present a novel approach named InterControl for flexible spatial control of every joint in every person at any time by leveraging motion diffusion model only trained on single-person data. We incorporate a motion controlnet to generate coherent and realistic motions given sparse spatial control signals and a loss guidance module to precisely align any joint to the desired position in a classifier guidance manner via Inverse Kinematics (IK). Extensive experiments on HumanML3D and KIT-ML dataset demonstrate its effectiveness in versatile joint control. We also collect data of joint contact pairs by LLMs to show InterControl's ability in human interaction generation.
Our code is developed from PriorMDM, therefore shares similar dependencies and setup instructions, which requires:
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
Setup conda env:
conda env create -f environment.yml
conda activate InterControl
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/GuyTevet/smplx.git
Download the model(s) you wish to use, then unzip and place it in ./save/
.
all joints control, finetuned for sparse signals in temporal mask0.25_bfgs5_posterior_all
all joints control, checkpoint for HumanML3D dataset evalution mask1_bfgs5_posterior_all
all joints control mask1_x0_all
pelvis control mask1_x0_pelvis
Loss Guidance on $\mu_t$
python -m sample.global_joint_control --model_path save/mask0.25_bfgs5_posterior_all/model000140000.pt \
--num_samples 32 --use_posterior --control_joint all
It will visualize generated motions in the format of skeletons. To render SMPL meshes, please refer to the following section.
The rendering part is exactly the same as PriorMDM. We make no changes to it, except for a little bug that they add the root offset to the mesh twice. The following is the original instruction from PriorMDM.
To create SMPL mesh per frame run:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
sample##_rep##_smpl_params.npy
- SMPL parameters (thetas, root translations, vertices and faces)sample##_rep##_obj
- Mesh per frame in .obj
format.Notes:
.obj
can be integrated into Blender/Maya/3DS-MAX and rendered using them.--device
flag)..mp4
path before running the script.Notes for 3d makers:
sample##_rep##_smpl_params.npy
(we always use beta=0 and the gender-neutral model).sample##_rep##_smpl_params.npy
file for your convenience.By adjusting the camera position and the lighting, you can get the same results as our interaction demo.
Select checkpoint to be evluated by sepcifying the model_path
, and use replication_times
for multiple evaluations and get average results, the following evaluation script will generate motions for 10 times.
Loss Guidance on $\mu_t$
python3 -m eval.eval_controlmdm --model_path save/mask1_bfgs5_posterior_all/model000120000.pt \
--replication_times 10 --mask_ratio 1 --bfgs_times_first 5 \
--bfgs_times_last 10 --bfgs_interval 1 --use_posterior \
--control_joint all
Loss Guidance on $x_0$
python3 -m eval.eval_controlmdm --model_path save/mask1_x0_all/model000160000.pt \
--replication_times 10 --mask_ratio 1 --bfgs_times_first 1 \
--bfgs_times_last 10 --bfgs_interval 1 \
--control_joint all
Two-people Interaction Sampling
It requires information in sample.json
to generate interactions. The information could be copied from ./assets/all_plans.json
(our collected interaction plans from LLM planner) to generate different interactions.
python -m sample.interactive_global_joint_control \
--model_path save/mask0.25_bfgs5_posterior_all/model000140000.pt \
--multi_person --bfgs_times_first 5 --bfgs_times_last 10 \
--interaction_json './assets/sample.json' \
It will visualize generated motions in the format of skeletons. To render SMPL meshes, please refer to rendering section in single-person motion generation.
More than 3 people interaction sampling, need hand-crafted masks for each person
python -m sample.more_people_global_joint_control \
--model_path save/mask0.25_bfgs5_posterior_all/model000140000.pt \
--multi_person --bfgs_times_first 5 --bfgs_times_last 10 --use_posterior \
Loss Guidance on $\mu_t$
python3 -m eval.eval_interaction --model_path save/mask0.25_bfgs5_posterior_all/model000140000.pt \
--replication_times 10 --bfgs_times_first 5 --bfgs_times_last 10 --bfgs_interval 1 \
--use_posterior --control_joint all \
--interaction_json './assets/all_plans.json' \
--multi_person
The model will save in the directory ./save/
+ values in --save_dir
. It requires pretrained MDM weights, which can be downloaded from my_humanml-encoder-512. Put the downloaded weights in ./save/
and make sure the checkpoint location is ./save/humanml_trans_enc_512/model000475000.pt
.
Loss Guidance on $\mu_t$
python3 -m train.train_global_joint_control --save_dir save/mask1_bfgs5_posterior_all \
--dataset humanml --inpainting_mask global_joint --lr 0.00001 --mask_ratio 1 --control_joint all \
--use_posterior --bfgs_times_first 5
Loss Guidance on $x_0$
python3 -m train.train_global_joint_control --save_dir save/mask1_x0_all \
--dataset humanml --inpainting_mask global_joint --lr 0.00001 --mask_ratio 1 --control_joint all \
--bfgs_times_first 0
Only for pelvis control
python3 -m train.train_global_joint_control --save_dir save/mask1_x0_pelvis \
--dataset humanml --inpainting_mask global_joint --lr 0.00001 --mask_ratio 1 --control_joint pelvis \
--bfgs_times_first 0
If you find this code useful in your research, please cite:
@article{wang2023intercontrol,
title={InterControl: Generate Human Motion Interactions by Controlling Every Joint},
author={Wang, Zhenzhi and Wang, Jingbo and Lin, Dahua and Dai, Bo},
journal={arXiv preprint arXiv:2311.15864},
year={2023}
}
This code is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:
GMD, PriorMDM, MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, TEACH.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.