zhegan27 / SCN_for_video_captioning

Using Semantic Compositional Networks for Video Captioning
98 stars 47 forks source link

Feature dimension and missing corpus files #9

Closed trminh89 closed 6 years ago

trminh89 commented 6 years ago

Hi,

I tried to run the first step (_Run 1_obtain_tagsyoutube2text.py to obtain the ground-truth 300 tags for the Youtube2Text dataset.) to obtain the tag file but there is no file "./data/corpus_youtube2text.p". Can you upload it? And how this file is format?

For Youtube2Text dataset, the dimension for C3D, ResNet, tag are (1970, 512), (1970, 2048), (1970, 300). What is each row in "1970" in this case? Is it just one frame or 1 video?

Also, can you give a more detailed guide about how to train on other datasets? Thanks!

zhegan27 commented 6 years ago

The youtueb2text.p is already inside the dropbox link that I provided (https://www.dropbox.com/home/CVPR2017_SCN_video_tagging/data). I am not sure why you did not find it. Please have a check.

x = cPickle.load(open("./data/corpus_youtube2text.p","rb")) train, val, test = x[0], x[1], x[2] wordtoix, ixtoword = x[3], x[4]

This is how it is formatted.

1970 is the number of videos. The feature is mean-pooled feature as specified in Section 3.5 in the paper.

trminh89 commented 6 years ago

Thanks for the quick reply!

Could you please check the sharing-permission of the link again? When I click the link, it asked me to sign in, but then nothing happens.

trminh89 commented 6 years ago

I can access through this link: https://www.dropbox.com/sh/amqm644a7zekgg5/AAC1GgKouhcfGKQVb8CLUWvla?dl=0

Thanks!