ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Mingyuan Zhang¹ Xinying Guo¹ Liang Pan¹ Zhongang Cai^1,2 Fangzhou Hong¹ Huirong Li¹
Lei Yang² Ziwei Liu¹⁺

¹S-Lab, Nanyang Technological University ²SenseTime Research

⁺corresponding author

---

[Project Page] • [arXiv] • [Video] • [Colab Demo] • [Hugging Face Demo]

Accepted to ICCV 2023

Abstract: 3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process.

Pipeline Overview: ReMoDiffuse is a retrieval-augmented 3D human motion diffusion model. Benefiting from the extra knowledge from the retrieved samples, ReMoDiffuse is able to achieve high-fidelity on the given prompts. It contains three core components: a) Hybrid Retrieval database stores multi-modality features of each motion sequence. b) Semantics-modulated transformer incorporates several identical decoder layers, including a Semantics-Modulated Attention (SMA) layer and an FFN layer. The SMA layer will adaptively absorb knowledge from both retrived samples and the given prompts. c) Condition Mxture technique is proposed to better mix model's outputs under different combinations of conditions.

Updates

[09/2023] Add a 🤗Hugging Face Demo!

[09/2023] Add a Colab Demo!

[09/2023] Release code for ReMoDiffuse and MotionDiffuse

Benchmark and Model Zoo

Supported methods

Citation

If you find our work useful for your research, please consider citing the paper:

@article{zhang2023remodiffuse,
  title={ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model},
  author={Zhang, Mingyuan and Guo, Xinying and Pan, Liang and Cai, Zhongang and Hong, Fangzhou and Li, Huirong and Yang, Lei and Liu, Ziwei},
  journal={arXiv preprint arXiv:2304.01116},
  year={2023}
}
@article{zhang2022motiondiffuse,
  title={MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model},
  author={Zhang, Mingyuan and Cai, Zhongang and Pan, Liang and Hong, Fangzhou and Guo, Xinying and Yang, Lei and Liu, Ziwei},
  journal={arXiv preprint arXiv:2208.15001},
  year={2022}
}

Installation

# Create Conda Environment
conda create -n mogen python=3.9 -y
conda activate mogen

# C++ Environment
export PATH=/mnt/lustre/share/gcc/gcc-8.5.0/bin:$PATH
export LD_LIBRARY_PATH=/mnt/lustre/share/gcc/gcc-8.5.0/lib:/mnt/lustre/share/gcc/gcc-8.5.0/lib64:/mnt/lustre/share/gcc/gmp-4.3.2/lib:/mnt/lustre/share/gcc/mpc-0.8.1/lib:/mnt/lustre/share/gcc/mpfr-2.4.2/lib:$LD_LIBRARY_PATH

# Install Pytorch
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y

# Install MMCV
pip install "mmcv-full>=1.4.2,<=1.9.0" -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.1/index.html

# Install Pytorch3d
conda install -c bottler nvidiacub -y
conda install -c fvcore -c iopath -c conda-forge fvcore iopath -y
conda install pytorch3d -c pytorch3d -y

# Install other requirements
pip install -r requirements.txt

Data Preparation

Download data files from google drive link or Baidu Netdisk link(access code: vprc). Unzipped all files and arrange them in the following file structure:

ReMoDiffuse
├── mogen
├── tools
├── configs
├── logs
│   ├── motiondiffuse
│   ├── remodiffuse
│   └── mdm
└── data
    ├── database
    ├── datasets
    ├── evaluators
    └── glove

Training

Training with a single / multiple GPUs

PYTHONPATH=".":$PYTHONPATH python tools/train.py ${CONFIG_FILE} ${WORK_DIR} --no-validate

Note: The provided config files are designed for training with 8 gpus. If you want to train on a single gpu, you can reduce the number of epochs to one-fourth of the original.

Training with Slurm

./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} ${GPU_NUM} --no-validate

Common optional arguments include:

--resume-from ${CHECKPOINT_FILE}: Resume from a previous checkpoint file.
--no-validate: Whether not to evaluate the checkpoint during training.

Example: using 8 GPUs to train ReMoDiffuse on a slurm cluster.

./tools/slurm_train.sh my_partition my_job configs/remodiffuse/remodiffuse_kit.py logs/remodiffuse_kit 8 --no-validate

Evaluation

Evaluate with a single GPU / multiple GPUs

PYTHONPATH=".":$PYTHONPATH python tools/test.py ${CONFIG} --work-dir=${WORK_DIR} ${CHECKPOINT}

Evaluate with slurm

./tools/slurm_test.sh ${PARTITION} ${JOB_NAME} ${CONFIG} ${WORK_DIR} ${CHECKPOINT}

Example:

./tools/slurm_test.sh my_partition test_remodiffuse configs/remodiffuse/remodiffuse_kit.py logs/remodiffuse_kit logs/remodiffuse_kit/latest.pth

Note: Run full evaluation for HumanML3D dataset is very slow. You can change replication_times in human_ml3d_bs128.py to $1$ for a quick evaluation.

Visualization

PYTHONPATH=".":$PYTHONPATH python tools/visualize.py ${CONFIG} ${CHECKPOINT} \
    --text ${TEXT} \
    --motion_length ${MOTION_LENGTH} \
    --out ${OUTPUT_ANIMATION_PATH} \
    --device cpu

Example:

PYTHONPATH=".":$PYTHONPATH python tools/visualize.py \
    configs/remodiffuse/remodiffuse_t2m.py \
    logs/remodiffuse/remodiffuse_t2m/latest.pth \
    --text "a person is running quickly" \
    --motion_length 120 \
    --out "test.gif" \
    --device cpu

Acknowledgement

This study is supported by the Ministry of Education, Singapore, under its MOE AcRF Tier 2 (MOE-T2EP20221-0012), NTU NAP, and under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

The visualization tool is developed on top of Generating Diverse and Natural 3D Human Motions from Text

mingyuan-zhang / ReMoDiffuse

readme

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

[Project Page] • [arXiv] • [Video] • [Colab Demo] • [Hugging Face Demo]

Accepted to ICCV 2023

Updates

Benchmark and Model Zoo

Supported methods

Citation

Installation

Data Preparation

Training

Training with a single / multiple GPUs

Training with Slurm

Evaluation

Evaluate with a single GPU / multiple GPUs

Evaluate with slurm

Visualization

Acknowledgement

mingyuan-zhang / ReMoDiffuse

readme

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

[Project Page] • [arXiv] • [Video] • [Colab Demo] • [Hugging Face Demo] Accepted to ICCV 2023

Updates

Benchmark and Model Zoo

Supported methods

Citation

Installation

Data Preparation

Training

Training with a single / multiple GPUs

Training with Slurm

Evaluation

Evaluate with a single GPU / multiple GPUs

Evaluate with slurm

Visualization

Acknowledgement

[Project Page] • [arXiv] • [Video] • [Colab Demo] • [Hugging Face Demo]

Accepted to ICCV 2023