Closed qniguoym closed 4 years ago
The task is a summarization task. We summarize from the sequence(s) and perform classification/ regression on this summarization.
Summarization can be any form; concatenating the last time steps is just one option. Note that these last time steps have a receptive field of the entire sequence length.
but the vector of last timestep only has information the last word attend?
No, the last tilmestep from the output of the encoder has a receptive field of the entire sequence length.
It attends to the entire sequence.
why do you use the last timestep of sequence for concatenation?