theEricMa / DiffSpeaker

This is the official repository for DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer
137 stars 18 forks source link

Training instructions #2

Open leohku opened 8 months ago

leohku commented 8 months ago

Hi Eric,

Great work! It is very impressive that DiffSpeaker can produce lower LVE and FDD while having faster inference speed. It's also surprising to me that a VAE isn't needed to construct the latents for this diffusion model beforehand.

I wonder if there will be instructions on training DiffSpeaker on other datasets apart from VOCASET and BIWI? I'm trying to train it on a dataset I collected, that is similar to VOCASET. Is it possible to provide instructions, or general guidance to point me in the right direction?

Thanks a lot in advance! Leo

theEricMa commented 8 months ago

Thanks for your interest in our code! This involves 1) setting up a dataset configuration within the assets directory to define the paths, as well as 2) establishing the data loader in get_data.py. Subsequently, you can 3) modify the dataset parameter in your experiment config to train using your custom dataset. We will provide the instructions in the readme file then.