kiranchhatre / amuse

[CVPR 2024] AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
https://amuse.is.tue.mpg.de/
Other
91 stars 4 forks source link

Train/test splits #1

Closed prinshul closed 4 months ago

prinshul commented 11 months ago

On what speakers training done? What are test speakers? The 22 speakers from BEAT dataset used both for train and test?

prinshul commented 5 months ago

Hi,

Can you please let me know the train and test speakers?

kiranchhatre commented 5 months ago

Thank you for your interest in our work.

Amuse is trained on specific takes from designated actors. The actor IDs can be found in the data module.

[“1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “10”, “12”, “13”, “16”, “18”, “21”, “26”, “27”, “30”]

Please check the code for the specific takes used for the actors during training. The total duration of the training data is ~5 hours, as indicated in the shared data MDB file.

Regarding test time: The perceptual study was explicitly conducted on takes from test actors. The editing application uses takes that might originate from the test actors and potentially other sources, including the held-out takes of the train actors.

prinshul commented 5 months ago

Thank you.

prinshul commented 5 months ago

Is it possible to run with multiple GPUs?

kiranchhatre commented 4 months ago

Yes, multiple GPU support is implemented for the audio model. However, it was not used in the final version. For more details, please refer to the updated README.