poor alignment when synthesizing long sentences

Thank you for your work! It helps a lot.
I want to ask whether your alignment is good when synthesizing sentences more than 10 words, like about 20 words. The paper said 'the model fails when conditioned on the shorter source phrases, successfully aligns when conditioned on the longest input.' The reference audio I used are about 20 words, but only when synthesizing shorter sentences, it works well. Attached please find some samples. Btw, I use nancy and blizzard 2017 for training. Could you give me some suggestions? Thank you. samples.zip

syang1993 / gst-tacotron

poor alignment when synthesizing long sentences #19