Closed pskrunner14 closed 6 years ago
Adds non-linearity after the computation of attention weights. Fixes #3
thanks so much :)
F.relu does not have a kwarg dim?
Not only does this not run until you remove dim=1 from the ReLU, this softmax does not make sense. It computes softmax along the batch dimension...
Adds non-linearity after the computation of attention weights. Fixes #3