soobinseo / Transformer-TTS

A Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
MIT License
661 stars 141 forks source link

too powerful decoder #18

Open saddlekiller opened 5 years ago

saddlekiller commented 5 years ago

When I tried to train my own transformer, I found the decoder is too powerful so that it was capable of generating spectrogram using almost no context information. Do you have any suggestion to overcome this issue? Thanks in advance.

soobinseo commented 5 years ago

Hi there,

I had a similar situation, in which case it's likely that the model didn't learn properly. Have you checked the alignment matrix of the encoder? If you don't get the correct diagonal matrix, the training goes wrong. Depending on the parameters, the robustness of the model is not very strong.

Thanks.

saddlekiller commented 5 years ago

Hi there,

I had a similar situation, in which case it's likely that the model didn't learn properly. Have you checked the alignment matrix of the encoder? If you don't get the correct diagonal matrix, the training goes wrong. Depending on the parameters, the robustness of the model is not very strong.

Thanks.

Thanks for your reply. I have checked all attention matrix actually and none of them has diagonal highlight. I have also tried to force the model to learn context alignment by masking self attentions, but not working.

sanghuynh1501 commented 3 years ago

hello @saddlekiller Did you solve the problem?

saddlekiller commented 3 years ago

hello @saddlekiller Did you solve the problem?

Guided attention sometimes help

sanghuynh1501 commented 3 years ago

hello @saddlekiller Did you solve the problem?

Guided attention sometimes help

thank you so much!

vskadandale commented 1 year ago

Hi, I trained the model in this repository on LJSpeech dataset and I am not able to see diagonal alignment in decoder attention and encoder-decoder attention after 160K iterations. I see somewhat diagonal alignment in the encoder self attention. Did anyone have similar issues? Is guided attention required to reproduce the plots of attention shown in the Readme of this repo? Many thanks!