jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
651 stars 150 forks source link

How about the intelligibility and stability #8

Closed chazo1994 closed 4 years ago

chazo1994 commented 4 years ago

I read your paper and found that your model have a significant improvement of inference speed, long sentence and character error rate, But I would like to know your experiment about other aspect of intelligibility and stability, how about the misalignment (even you won't use attention), punctuation misalignment, skip word, worse sound at the end of the long sentence, the stability of long paragraph (like difference voice between two or more sentence of paragraph), etc,... Thank for your hard work!

jaywalnut310 commented 4 years ago

If what you mean is Attention error analysis as in Robustness section of FastSpeech, I listed the result in Appendix B. I hope the answer be helpful.