Mumbling in synthesis - Githubissues

Hey, thanks for the implementation @syang1993!

I'm using this code to implement another paper and I've bumped into some issues during synthesis. I'm getting good alignment on training and the interim synthesised results sound good, however during evaluation, the synthesis is very unpredictable and sometimes fails to synthesise understandable speech. It rather sounds like mumbling. It's not only on long utterances, but sometimes on short and mid-length texts too. I'm attaching a few alignment plots and audio examples.

I was wondering if you've come across this before and if you have any tips where I should look to fix this issue? I've trained the model using the multihead attention, do you reckon the GMM attention will improve a lot? eval-320000_ref-frankenstein_chp_13-4-align mumbling_samples.zip

syang1993 / gst-tacotron

Mumbling in synthesis #45