Kiran Chhatre
·
Radek Daněček
·
Nikos Athanasiou
Giorgio Becherini
·
Christopher Peters
·
Michael J. Black
·
Timo Bolkart
This is a repository for AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion. AMUSE generates realistic emotional 3D body gestures directly from a speech sequence (top). It provides user control over the generated emotion by combining the driving speech with a different emotional audio (bottom).
git clone https://github.com/kiranchhatre/amuse.git
cd amuse/dm/utils/
git clone https://github.com/kiranchhatre/sk2torch.git
git clone -b init https://github.com/kiranchhatre/PyMO.git
cd ../..
git submodule update --remote --merge --init --recursive
git submodule sync
git submodule add https://github.com/kiranchhatre/sk2torch.git dm/utils/sk2torch
git submodule add -b init https://github.com/kiranchhatre/PyMO.git dm/utils/PyMO
git submodule update --init --recursive
git add .gitmodules dm/utils/sk2torch dm/utils/PyMO
conda create -n amuse python=3.8
conda activate amuse
export CUDA_HOME=/is/software/nvidia/cuda-11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda env update --file amuse.yml --prune
module load cuda/11.3
conda install anaconda::gxx_linux-64 # install 11.2.0
FORCE_CUDA=1 pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html
conda deactivate
conda env create -f blender.yaml
AMUSEPATH=$(pwd)
cd ~
wget https://download.blender.org/release/Blender3.4/blender-3.4.1-linux-x64.tar.xz
tar -xvf ./blender-3.4.1-linux-x64.tar.xz
cd ~/blender-3.4.1-linux-x64/3.4
mv python/ _python/
ln -s /home/kchhatre/anaconda3/envs/envs/blender ./python
cd "$AMUSEPATH"
cd scripts
conda activate amuse
Follow instructions: https://amuse.is.tue.mpg.de/download.php
Once the above setup is correctly done, you can execute the following:
[x] train_audio (training step 1/2)
Train AMUSE step 1 of the speech disentanglement model.
cd $AMUSEPATH/scripts
python main.py --fn train_audio
[x] train_gesture (training step 2/2)
Train AMUSE step 2 of the gesture generation model.
cd $AMUSEPATH/scripts
python main.py --fn train_gesture
[x] infer_gesture
Infer AMUSE on a single 10s WAV monologue audio sequence.
Place audio in $AMUSEPATH/viz_dump/test/speech
.
Video of generated gesture will be in $AMUSEPATH/viz_dump/test/gesture
.
cd $AMUSEPATH/scripts
python main.py --fn infer_gesture
[ ] edit_gesture
COMING SOON
cd $AMUSEPATH/scripts
python main.py --fn infer_gesture
[x] bvh2smplx_
Convert BVH to SMPLX (only with provided BMAP presets from AMUSE website download page if possible).
Highly experimental, no support.
Place BVH file inside $AMUSEPATH/data/beat-rawdata-eng/beat_rawdata_english/<<actor_id>>
, where actor_id is between 1 and 30. The converted file will be in $AMUSEPATH/viz_dump/smplx_conversions
.
cd $AMUSEPATH/scripts
python main.py --fn bvh2smplx_
Once converted, import the file in Blender using the SMPLX blender addon. Remember to specify the target FPS (for current file: 24 FPS) in the import animation window while importing the NPZ file.
[ ] prepare_data
Train AMUSE on BEAT 0.2.1 or BEAT-X or custom dataset.
COMING SOON: Conversion script, dataloader LMDB file creation.
cd $AMUSEPATH/scripts
python main.py --fn prepare_data
[ ] other
COMING SOON
@InProceedings{Chhatre_2024_CVPR,
author = {Chhatre, Kiran and Daněček, Radek and Athanasiou, Nikos and Becherini, Giorgio and Peters, Christopher and Black, Michael J. and Bolkart, Timo},
title = {{AMUSE}: Emotional Speech-driven {3D} Body Animation via Disentangled Latent Diffusion},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {1942-1953},
url = {https://amuse.is.tue.mpg.de},
}
For any inquiries, please feel free to contact amuse@tue.mpg.de. Feel free to use this project and contribute to its improvement.