Closed weiyengs closed 4 years ago
I didn't look into the implementation details, but this line might be what you are looking for :
It refers to :
Okay, I think I have made a mistake. The decoder of transformer does not initialised with the last hidden state or some hidden states of encoder, unlike the traditional Seq to Seq architecture. In transformers, the encoder hidden states(bert output in this case) is used in the 2nd attention layer(multi-head attention) instead.
Ref: https://www.d2l.ai/chapter_attention-mechanism/transformer.html
Hi, thanks for the generosity in sharing the codes. I have a question. I noticed that the decoder is not initialised with either the last hidden states of the bert representation(output from bert), or the encoded states of the source sub-tokens, as decoder is usually initialised with something.
https://github.com/nlpyang/PreSumm/blob/29a6b1ace2290808f39c76ae2ef0e92d515fc049/src/models/decoder.py#L266
In the above line of code, I believe the "memory_bank" variable is usually used to do that but its not being used in your codes.
Is there an intentional design? Or am I missing out something..
Thanks!