yaohungt / Multimodal-Transformer

[ACL'19] [PyTorch] Multimodal Transformer
MIT License
818 stars 152 forks source link

last_h_v = last_hs = h_vs[:,-1,:] #14

Closed qniguoym closed 4 years ago

qniguoym commented 4 years ago

why do you use the last timestep of sequence for concatenation?

yaohungt commented 4 years ago

The task is a summarization task. We summarize from the sequence(s) and perform classification/ regression on this summarization.

Summarization can be any form; concatenating the last time steps is just one option. Note that these last time steps have a receptive field of the entire sequence length.

qniguoym commented 4 years ago

but the vector of last timestep only has information the last word attend?

yaohungt commented 4 years ago

No, the last tilmestep from the output of the encoder has a receptive field of the entire sequence length.

It attends to the entire sequence.