xiadingZ / video-caption.pytorch

pytorch implementation of video captioning
MIT License
401 stars 130 forks source link

number of train caption is < 10000 #37

Open piperino11 opened 5 years ago

piperino11 commented 5 years ago

Msr vtt dataset have 10000 videos and 20 captions for each video but in this implementation only a video-caption pair in train phase is considered. Therefore in total <= 10000 example for train. someone has seen the same thing???? has anyone changed the code?

chongkewu commented 4 years ago

For each epoch the training caption will change. It will sample 1 of the 20 captions everytime when you get item from video dataset, you can check out the dataloader.py file

alokssingh commented 4 years ago

Hey @chongkewu hope you are doing well. I have a query hope you have a answer. For each video we have 20 refrence captions so from your above ans what i understand is that for every epoch it will select randomly one captions from available 20 captions. Isn't ?

chongkewu commented 4 years ago

Yes, that is correct

On Mon, Mar 23, 2020 at 10:39 PM Alok singh notifications@github.com wrote:

External Email

Hey @chongkewu https://github.com/chongkewu hope you are doing well. I have a query hope you have a answer. For each video we have 20 refrence captions so from you above ans what i understand is that for every epoch it will select randomly one captions from available 20 captions. Isn't ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xiadingZ/video-caption.pytorch/issues/37#issuecomment-603027561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKQQQMPQ4SIT2I5YYNTKRQDRJBBRXANCNFSM4ISUBTVQ .

alokssingh commented 4 years ago

thank you @chongkewu. Do you think that in this way the model will be trained sufficiently?

chongkewu commented 4 years ago

For the challenge I think it is enough. A video has many candidates and the model just need to output one sentence.

On Mon, Mar 23, 2020 at 11:49 PM Alok singh notifications@github.com wrote:

External Email

thank you @chongkewu https://github.com/chongkewu. Do you think that in this way the model will be trained sufficiently?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xiadingZ/video-caption.pytorch/issues/37#issuecomment-603056347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKQQQMPM5GI5HQ57BX6UFHTRJBJVTANCNFSM4ISUBTVQ .

alokssingh commented 4 years ago

@chongkewu thank you so much for your instant replies. Will try some new approaches and will let you inform about the performance.

alokssingh commented 4 years ago

@chongkewu After selecting the caption randomly do we training the model in such a

    X1      X2(text sequence)                               y(word)
    -----------------------------------------------------------------
    image   startseq,                                       little
    image   startseq, little,                                       girl
    image   startseq, little, girl,                                running
    image   startseq, little, girl, running,                                in
    image   startseq, little, girl, running, in,                            field
    image   startseq, little, girl, running, in, field,                   endseq

or just directly passing image and whole caption to the model?