syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

poor alignment when synthesizing long sentences #19

Open moonnee opened 6 years ago

moonnee commented 6 years ago

Thank you for your work! It helps a lot.
I want to ask whether your alignment is good when synthesizing sentences more than 10 words, like about 20 words. The paper said 'the model fails when conditioned on the shorter source phrases, successfully aligns when conditioned on the longest input.' The reference audio I used are about 20 words, but only when synthesizing shorter sentences, it works well. Attached please find some samples. Btw, I use nancy and blizzard 2017 for training. Could you give me some suggestions? Thank you. samples.zip

syang1993 commented 5 years ago

Hi, for long sentences, you can try the GMM attention. It works well especially for long sentences.