jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild
Other
7.51k stars 739 forks source link

Could you please explain the different models, and the best one for TTS finetuning? When will the enhanced models be uploaded? #114

Open clearpathai opened 5 months ago

clearpathai commented 5 months ago

I know you've mentioned a couple different models, such as the 830, the 330, the 830 TTS, the 330 TTS, the 830 TTS enhanced, and the 330 TTS enhanced. Could you please explain a bit more details about how these models differ?

I know you originally designed this to be more for one-shot samples, to cloning. 1. Which is best for finetuning across a larger dataset (not just a few one-shot samples)? 2. Do you have any specific parameter or finetuning recommendations, either through feedback or your own knowledge from working with these models? 3. Do you have an eta for when the enhanced 330 TTS model will be uploaded?