r9y9 / tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.
http://nbviewer.jupyter.org/github/r9y9/tacotron_pytorch/blob/master/notebooks/Test%20Tacotron.ipynb
Other
306 stars 79 forks source link

Masked loss function #21

Closed njellinas closed 5 years ago

njellinas commented 5 years ago

Hello,

Shouldn't you apply the L1-loss only to the real frames and not the padding? I.e. you implement correctly the GRU in CBHG with the pack_padded_sequence and the masked attention, but in the end I think that you calculate the L1-loss on the whole generated utterance.

Please tell me if I am missing something because I am in the middle of some same debugging problems!

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

saikrishnarallabandi commented 4 years ago

If we use end of sentence token as the padding index, calculating loss over the entire sentence ( including padding) makes sense. It forces model to correctly predict the end. Section 4 in the original paper mentions this:

https://arxiv.org/pdf/1703.10135.pdf