Closed yiwei0730 closed 11 months ago
yes, at the moment it requires three models across three repositories. so unless you are an exceptional engineer or scientist (like Lucas), you will have trouble getting it all working in concert. this isn't something that works just by running a script just yet
give me more time to think about how to weave this all together
@yiwei0730 on the other hand, if you want to test out unconditional training, then you should be able to get working quite easily with just the base model in this repository alone
to answer your original question, you need a trained text-to-semantic model from spear-tts, which requires yet another 3 step training process
If I want to train this package of models, do I need to run spear-tts first to obtain the text-to-semantic model before running voicebox, or can I directly run the voicebox semantic model and train the main model together?