jayleicn / recurrent-transformer

[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
https://arxiv.org/abs/2005.05402
MIT License
166 stars 25 forks source link

Inference on single video #1

Closed nikky4D closed 4 years ago

nikky4D commented 4 years ago

Hi,

Thank you for your work. I'm making code to allow for easy testing of caption generation on user videos. Can you tell me how I can generate captions for my own videos?

What features are required? is there a feature generation code I can use? what is the command to run inference on my own videos?

Thanks

jayleicn commented 4 years ago

Hi @nikky4D ,

Thanks for your interest in our work! To generate captions for videos from ActivityNet Captions:

bash scripts/translate_greedy.sh anet_re_init_2019_10_01_11_34_22 val

as described in https://github.com/jayleicn/recurrent-transformer#inference-with-pre-trained-model. To generate captions for user videos, you may follow a similar procedure, but need t to provide corresponding files.

The features used for our model is provided by https://github.com/salesforce/densecap#annotation-and-feature, the feature extraction code is inside this repo https://github.com/LuoweiZhou/anet2016-cuhk-feature. You may find more details there.

By the way, if you had some success on this topic, a PR is welcomed! I am also happy to help! : )

Best, Jie

amil-rp-work commented 4 years ago

Hey @jayleicn Thanks for providing the info, however, the repo you linked https://github.com/LuoweiZhou/anet2016-cuhk-feature, doesn't have clear instructions on how to extract the features. Is it possible for you to provide some help here?

jayleicn commented 4 years ago

Sorry, I did not extract the features myself so I am not able to confidently reply to your request. You may ask the original authors for clear instructions.

DesaleF commented 3 years ago

@amil-rp-work did you succeed running inference on a single video? If yes can you please share me some details on what should I do to run inference on single video(my video)?

wanghao14 commented 3 years ago

@DesaleF Hi, I have ran inference on a single video. You can do it by following steps:

  1. Refer to the format of the annotation of anet to mark the start and end point of the event in your video.
  2. Refer to the code anet2016-cuhk and anet2016-cuhk-feature to extract the features of your video.
  3. Run the pre-trained model on the extracted features to generate the caption.
DesaleF commented 3 years ago

@wanghao14 Thank you very much! I followed your steps and now I extracted the features. But I couldn't find any pretrained model on this repo. did you train by yourself or get pretrained model somewhere? if you train the model, could you please share the trained model? Also the feature extraction code from anet2016-cuhk-feature doesn't work for me out of the box. I did some tweak and it worked but still for some video it fails to extract the feature.

ooza commented 3 years ago

@DesaleF did you succeed running inference on a single video? If yes can you tell me where can I find the pretrained model ?