Closed mengxuehu closed 5 years ago
According to this line, dropout is being applied to the sequence of embeddings after they have been looked up in the embedding.
I'm not sure if this is right, because I think that this has the effect of randomly dropping elements of the embedding vector corresponding to a token rather than dropping whole embedding vectors (post lookup). Instead, whole vectors along the sequence dimension should be dropped out.
Furthermore, I'm pretty sure that the 0 vector doesn't necessarily correspond to the unk token since the embeddings are learned.
Right, this is different from the paper. Unfortunately, I don't have the time to fix this right now. But feel free to do a PR. If you do, please also keep the dropout version as an option.
How about testing phase? We do need to disable dropout right? Based on current implementation, dropout still works in the testing phase.
According to the papar, "We do this by randomly replacing some fraction of the conditioned-on word tokens with the generic unknown word token unk", but it seems that this implementation is dropping out word embedding.