nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
MIT License
1.28k stars 464 forks source link

Decoder initial state not initialised? #18

Closed weiyengs closed 4 years ago

weiyengs commented 5 years ago

Hi, thanks for the generosity in sharing the codes. I have a question. I noticed that the decoder is not initialised with either the last hidden states of the bert representation(output from bert), or the encoded states of the source sub-tokens, as decoder is usually initialised with something.

https://github.com/nlpyang/PreSumm/blob/29a6b1ace2290808f39c76ae2ef0e92d515fc049/src/models/decoder.py#L266

In the above line of code, I believe the "memory_bank" variable is usually used to do that but its not being used in your codes.

Is there an intentional design? Or am I missing out something..

Thanks!

astariul commented 5 years ago

I didn't look into the implementation details, but this line might be what you are looking for :

https://github.com/nlpyang/PreSumm/blob/29a6b1ace2290808f39c76ae2ef0e92d515fc049/src/models/model_builder.py#L242

It refers to :

https://github.com/nlpyang/PreSumm/blob/29a6b1ace2290808f39c76ae2ef0e92d515fc049/src/models/decoder.py#L216-L222

weiyengs commented 5 years ago

Okay, I think I have made a mistake. The decoder of transformer does not initialised with the last hidden state or some hidden states of encoder, unlike the traditional Seq to Seq architecture. In transformers, the encoder hidden states(bert output in this case) is used in the 2nd attention layer(multi-head attention) instead.

Ref: https://www.d2l.ai/chapter_attention-mechanism/transformer.html