maxjcohen / transformer

Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series.
https://timeseriestransformer.readthedocs.io/en/latest/
GNU General Public License v3.0
842 stars 165 forks source link

self._encoderDecoderAttention defined but not used #23

Closed kryptonite0 closed 3 years ago

kryptonite0 commented 3 years ago

Hi Max, thank you for sharing this amazing repo. I have a question regarding the architecture. In the Decoder module (https://github.com/maxjcohen/transformer/blob/master/tst/decoder.py) you define a self._encoderDecoderAttention (L62) that you do not seem to use in the forward method. Instead, in the forward method both the self-attention and the encoder-decoder attention use the same layer self._selfAttention. Is this a bug maybe? I compared the architecture with this https://nlp.seas.harvard.edu/2018/04/03/attention.html and I would guess that the encoder-decoder attention should be done using self._encoderDecoderAttention. What do you think? Can I also please ask you to tell me if you wrote the architecture yourself from scratch? Thanks in advance! :)

maxjcohen commented 3 years ago

Hi, you're right, there something fishy about this forward definition.

I wrote the architecture using the original paper, as well as a few repos that already implemented the Transformer in pytorch and keras. I also used The Annotated Transformer that you linked, and the Illustrated Transformer. Today, there is a Transformer model directly implemented in PyTorch.

kryptonite0 commented 3 years ago

Hi Max, thanks a lot for your answer. What I'm still confused with is the fact that if you use the same instance of a layer twice in the forward method, you are effectively using the same weights twice in an RNN fashion. Is this correct and planned?

maxjcohen commented 3 years ago

Yes, yes, i was so focused on searching for other variables inside the MHA class that could be saved between both attention inferences, that I forgot weights were a thing. Fixed in 240f8fbc2c20fac4182239fa70a6181c7120bd3c, thanks a lot !

kryptonite0 commented 3 years ago

Thanks a lot to you :))