Why using relu to compute additaive attention

keon / seq2seq

Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch

MIT License

689 stars 172 forks source link

Open yuboona opened 4 years ago

yuboona commented 4 years ago

score = v * tanh(W * [hidden; encoder_outputs])

score = v * relu(W * [hidden; encoder_outputs])

Is there some trick here? or this is a result after experimental comparision.