v-iashin / MDVC

PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)
https://v-iashin.github.io/mdvc
142 stars 19 forks source link

About text #26

Closed jxrloveyou closed 2 years ago

jxrloveyou commented 2 years ago

caption_idx = caption_data.caption caption_idx, caption_idx_y = caption_idx[:, :-1], caption_idx[:, 1:] Excuse me, why do you want to remove the first token and the last token in the second line?

v-iashin commented 2 years ago

Hi, it is to create targets. We want our model to predict the next token in the sequence:

sequence: start,1,2,3,4,5,end Target (y): 1,2,3,4,5,end Input: start,1,2,3,4,5