zhegan27 / SCN_for_video_captioning

Using Semantic Compositional Networks for Video Captioning
98 stars 47 forks source link

About corpus.p file #8

Closed LiuDong777 closed 6 years ago

LiuDong777 commented 6 years ago

hi: I want to know what file are the corpus.p and youtube2text_nbest.p? how can i get them when i want to caption my own raw video. thanks!!!

zhegan27 commented 6 years ago

The corpus.p file is the ground-truth training captions corresponding to videos. How to generate such preprocessed captions are provided in my video tagging code (https://www.dropbox.com/home/CVPR2017_SCN_video_tagging/preprocess_raw_data).

The youtube2text_nbest.p file is the N-best-list captions that the model generates for each test video.

When doing the project, the C3D features are extracted under the linux system, while the ResNet features are extracted under the windows system, so it is not easy for me to combine them together for a demo purpose to caption a raw video from scratch as you request.

However, I have updated my image captioning repo (https://github.com/zhegan27/Semantic_Compositional_Nets) and now it contains a demo code with a dropbox link that is able to detect tags and generates captions for an image from scratch. You can have a check if you are also interested in image captioning.

I may create a demo code that could feed in a raw video for captioning end-to-end later when I find time.

Thanks for your patience!

LiuDong777 commented 6 years ago

When I run the 3_training_video_tagging_model.py script, I get the file youtube_tagging_learned_params.npz. Is this model used to generate tags? Just like tag_feats.mat file?

------------------ 原始邮件 ------------------ 发件人: "Zhe Gan"notifications@github.com; 发送时间: 2018年3月26日(星期一) 上午10:29 收件人: "zhegan27/SCN_for_video_captioning"SCN_for_video_captioning@noreply.github.com; 抄送: "迷路的孩子"1015696276@qq.com; "Author"author@noreply.github.com; 主题: Re: [zhegan27/SCN_for_video_captioning] About corpus.p file (#8)

The corpus.p file is the ground-truth training captions corresponding to videos. How to generate such preprocessed captions are provided in my video tagging code (https://www.dropbox.com/home/CVPR2017_SCN_video_tagging/preprocess_raw_data).

The youtube2text_nbest.p file is the N-best-list captions that the model generates for each test video.

When doing the project, the C3D features are extracted under the linux system, while the ResNet features are extracted under the windows system, so it is not easy for me to combine them together for a demo purpose to caption a raw video from scratch as you request.

However, I have updated my image captioning repo (https://github.com/zhegan27/Semantic_Compositional_Nets) and now it contains a demo code with a dropbox link that is able to detect tags and generates captions for an image from scratch. You can have a check if you are also interested in image captioning.

I may create a demo code that could feed in a raw video for captioning end-to-end later when I find time.

Thanks for your patience!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhegan27 commented 6 years ago

yes, you are correct. :)

LiuDong777 commented 6 years ago

But I didn't find a script that generates a file such as tag_feats.mat. How do I use the youtube_tagging_learned_params.npz model?

------------------ 原始邮件 ------------------ 发件人: "Zhe Gan"notifications@github.com; 发送时间: 2018年3月26日(星期一) 上午10:29 收件人: "zhegan27/SCN_for_video_captioning"SCN_for_video_captioning@noreply.github.com; 抄送: "迷路的孩子"1015696276@qq.com; "Author"author@noreply.github.com; 主题: Re: [zhegan27/SCN_for_video_captioning] About corpus.p file (#8)

The corpus.p file is the ground-truth training captions corresponding to videos. How to generate such preprocessed captions are provided in my video tagging code (https://www.dropbox.com/home/CVPR2017_SCN_video_tagging/preprocess_raw_data).

The youtube2text_nbest.p file is the N-best-list captions that the model generates for each test video.

When doing the project, the C3D features are extracted under the linux system, while the ResNet features are extracted under the windows system, so it is not easy for me to combine them together for a demo purpose to caption a raw video from scratch as you request.

However, I have updated my image captioning repo (https://github.com/zhegan27/Semantic_Compositional_Nets) and now it contains a demo code with a dropbox link that is able to detect tags and generates captions for an image from scratch. You can have a check if you are also interested in image captioning.

I may create a demo code that could feed in a raw video for captioning end-to-end later when I find time.

Thanks for your patience!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhegan27 commented 6 years ago

Using the learned youtube_tagging_learned_params.npz model to generate the tag_feats.mat file is implicitly defined inside the 3_training_video_tagging_model.py script. Please see line 214-215:

tag_feats = f_pred(img_feats) scipy.io.savemat("./tag_feats_pred.mat",{'feats':tag_feats})

tag_feats_pred.mat is the tag_feats.mat file.

f_pred is defined in line 115:

(use_noise, z, y, cost, f_pred) = build_model(tparams,options)

which uses the learned parameters. That is, after running the training script, it will automatically generates the tag features using the learned parameters.