jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
651 stars 150 forks source link

Output compared to Fastspeech2 #60

Open debasish-mihup opened 2 years ago

debasish-mihup commented 2 years ago

I have some question regarding quality of Fastspeech2 output compared to Glow TTS. Currently I am using Glow TTS generated Mels with HifiGan vocoder and quality is good. There is scope of improvement in prosody. Tacotron2 works better in this regard but has high inference time as well as performs poorly when input sentence length increases. Fastspeech2's inference speed is faster that of Glow TTS but given that contribution of TTS is small compared to time taken by vocoder. I am rather interested in knowing whether Fastspeech2 would help increase quality in terms of intonation, pauses and stress of output sentences? Does anyone here trained both using Glow TTS vs Fastspeech2?