jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
651 stars 150 forks source link

Improving prosody? Specifically the length of pauses between words? #26

Open seantempesta opened 4 years ago

seantempesta commented 4 years ago

First off, this project is amazing! I'm getting great results compared to Tacotron2 with much shorter training times and it's unbelievably stable even for long sentences. Congratulations. :)

The only thing I've found that Tacotron2 did better was capturing the manner that people speak in. Specifically the speed words are spoken and how long they tend to pause between words. Is this something that can be adjusted in the loss function to fine tune the model to pay more attention to these aspects?