Closed njellinas closed 5 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If we use end of sentence token as the padding index, calculating loss over the entire sentence ( including padding) makes sense. It forces model to correctly predict the end. Section 4 in the original paper mentions this:
Hello,
Shouldn't you apply the L1-loss only to the real frames and not the padding? I.e. you implement correctly the GRU in CBHG with the pack_padded_sequence and the masked attention, but in the end I think that you calculate the L1-loss on the whole generated utterance.
Please tell me if I am missing something because I am in the middle of some same debugging problems!