lucidrains / voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
MIT License
589 stars 49 forks source link

VoiceBox Training #33

Closed yiwei0730 closed 11 months ago

yiwei0730 commented 11 months ago

If I want to train this package of models, do I need to run spear-tts first to obtain the text-to-semantic model before running voicebox, or can I directly run the voicebox semantic model and train the main model together?

lucidrains commented 11 months ago

yes, at the moment it requires three models across three repositories. so unless you are an exceptional engineer or scientist (like Lucas), you will have trouble getting it all working in concert. this isn't something that works just by running a script just yet

give me more time to think about how to weave this all together

lucidrains commented 11 months ago

@yiwei0730 on the other hand, if you want to test out unconditional training, then you should be able to get working quite easily with just the base model in this repository alone

lucidrains commented 11 months ago

to answer your original question, you need a trained text-to-semantic model from spear-tts, which requires yet another 3 step training process