Base model for zero shot speech generation

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell.

https://research.myshell.ai/open-voice

MIT License

28.82k stars 2.82k forks source link

Base model for zero shot speech generation #263

Open cjohn001 opened 3 months ago

cjohn001 commented 3 months ago

Hello together, I am currently trying to use OpenVoice for German language generation. I have not been able to figure out how this zero shot speech synthesis shall work. Is there some kind of multilanguage base model missing? When I use one of the language dependent base models things sound weird.

It would also be interesting if someone could explain how the different emotions/speech styles can be controlled. The documentation of the API could benefit from some more examples.

Vicopem01 commented 2 months ago

the text to speech synthesis in v1 is powered with openAI TTS system, the v2 is via MeloTTS. the v2 sounds more improved from my experience.

on first run, the models will be loaded automatically to your system and OpenVoice performs tone color conversion on the synthesized audio. here is the demo set up