Non linearity after attention

keon / seq2seq

Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch

MIT License

689 stars 172 forks source link

Closed pskrunner14 closed 6 years ago

pskrunner14 commented 6 years ago

Adds non-linearity after the computation of attention weights. Fixes #3

keon commented 6 years ago

thanks so much :)

CuriousG102 commented 5 years ago

F.relu does not have a kwarg dim?

CuriousG102 commented 5 years ago

Not only does this not run until you remove dim=1 from the ReLU, this softmax does not make sense. It computes softmax along the batch dimension...