padmalcom / jotts

German Text-To-Speech Engine using Tacotron and Griffin-Lim
MIT License
6 stars 1 forks source link

jotts

JoTTS is a German text-to-speech engine using tacotron and griffin-lim or wavernn as vocoder. The synthesizer model has been trained on my voice using tacotron1. Using grifin-lim as vocoder makes the audio generation much faster whereas using a trained vocoder returns better results in most cases.

Buy Me A Coffee

API

Example usage

from jotts import JoTTS
if __name__ == "__main__":
    tts = JoTTS()
    tts.list_models()
    tts.load_models(force_model_download=False, model_name="jonas_v0.1")
    tts.speak("Das ist ein Test mit meiner Stimme.", wait_for_end = True, use_wavernn_vocoder=True)
    tts.speak("Das ist ein Test mit meiner Stimme.", wait_for_end = True, use_wavernn_vocoder=False)
    tts.textToWav(text="Das ist ein Test mit meiner Stimme.", out_path="vocoder_out.wav", use_wavernn_vocoder=True)
    tts.textToWav(text="Das ist ein Test mit meiner Stimme.", out_path="griffin_lim_out.wav", use_wavernn_vocoder=False)

Todo

Training a model for your own voice

Training a synthesizer model is easy - if you know how to do it. I created a course on udemy to show you how it is done. Don't buy the tutorial for the full price, there is a discout every month :-)

https://www.udemy.com/course/voice-cloning/

If you neither have the backgroud or the resources or if you are just lazy or too rich, contact me for contract work. Cloning a voice normally needs ~15 Minutes of clean audio from the voice you want to clone.

Disclaimer

I hope that my (and any other person's) voice will be used only for legal and ethical purposes. Please do not get into mischief with it.