thushv89 / attention_keras

Keras Layer implementation of Attention for Sequential models
https://towardsdatascience.com/light-on-math-ml-attention-with-keras-dc8dbc1fad39
MIT License
443 stars 266 forks source link

Bahdanau attention #33

Closed nilavghosh closed 4 years ago

nilavghosh commented 4 years ago

https://github.com/thushv89/attention_keras/blob/f7c6f40cb207431d0229c38992eb93ad17d38e20/examples/nmt/model.py#L30

Is the implementation here a variation of the Bahdanau attention paper?. As per the paper during training the alignment vector is concatenated with the embedded target of the previous timestep then this vector is supplied to the decoder.

https://github.com/thushv89/attention_keras/blob/f7c6f40cb207431d0229c38992eb93ad17d38e20/examples/nmt/model.py#L35 In the code base here, this concatenated vector is directly 'softmaxed' to get the predicted output.

Are these implementation fundamentally the same?

moazshorbagy commented 4 years ago

@nilavghosh I have the same question "Are these implementation fundamentally the same?"

thushv89 commented 4 years ago

Hi, Yes this is a variant of the original approach. But it is not either an uncommon approach. However, I'm not sure about the performance difference.

The main reason I picked this approach is because it makes more sense to me have these attention outputs closer to the output layer as the output layer (as opposed to having them at decoder inputs) is what's going to make the final decision.