kiranchhatre / amuse

[CVPR 2024] AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
https://amuse.is.tue.mpg.de/
Other
91 stars 4 forks source link

Implementation of audio chunking #8

Closed Yo-Hsin closed 4 months ago

Yo-Hsin commented 4 months ago

Thanks for your work and the code release.

I'd like to ask about the audio chunking mechanism in dm.py. When processing the cache, it seems that the 3 features extracted from the audio clip (Line 597) are not aligned with the motion chunk (Line 641). Is this implementation designed for special purpose? Additionally, is the provided constructed by using this implementation?

Looking forward to your reply, thanks.

kiranchhatre commented 4 months ago

Thanks for your question!

The audio chunking mechanism in dm.py is designed to handle audio and motion data in a synchronized manner, considering their respective sampling rates and frame rates.

For clarity:

Therefore, the alignment of the audio features (c, e, s) with the motion chunks is maintained as both audio and motion data are segmented into 10-second intervals. This alignment ensures that the features computed from audio and the corresponding motion frames are consistent with each other.