actnetchallenge: Task 3 (Dense-Captioning Events in Videos)

Repo for activity net challenge 2019: Task 3 (Dense-Captioning Events in Videos) This repository provides a dense video captioning module for ActivityNet Captions Dataset.

TO-DO:

[x] complete script for downloading ActivityNet videos
[x] complete script for converting .mp4 videos to .jpg frames
[x] write dataset class for ActivityNet Captions dataset
[x] write baseline model for training
[x] add optional training
[x] add evaluation
[ ] add spatiotemporal attention
[ ] add proposal generation code
[ ] add testing code
[ ] add Transformer training
[ ] add BERT training
[ ] add character level training

Requirements

Python>=3.6
numpy
matplotlib
Pillow
accimage (optional, faster than Pillow)
pytorch>=1.0
torchvision>=0.2
pytube
torchtext (for spacy tokenizer and vocabulary)
nlg-eval (for evaluation metrics)
mkl-service (for theano, evaluation)

How to download ActivityNet Captions Dataset (ActivityNet Videos + Annotations)

Download json file for ActivityNet dataset from here
Modify download.sh and fix the command line argument for root directory to save the dataset. This path will be denoted $root_path.
Make sure you have at least 300GB on your storage.
Run bash download.sh to download .mp4 files.
Download json files for ActivityNet Captions dataset from here
Extract downloaded files to $root_path
Run python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/train.json -o ${save_path}
Run python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/val_1.json -o ${save_path}
Run python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/val_2.json -o ${save_path}

How to convert video files to image files

Make sure you have at least 1TB and enough Inodes left on your storage.
Run python utils/mp42jpg.py ${video_dir} ${root_path}/frames activitynet --n_jobs=${number_of_workers}

Training procedures

Run train.py with configurations (script is in train/trainscript.sh)

Testing procedures

Proposal Generation is not implemented yet, so prepare a json file with proposals.
Run test.py with configurations (script is in eval/eval.sh)

skasai5296 / actnetchallenge

readme

actnetchallenge: Task 3 (Dense-Captioning Events in Videos)

Requirements

How to download ActivityNet Captions Dataset (ActivityNet Videos + Annotations)

How to convert video files to image files

Training procedures

Testing procedures

Samples

Transformer Captions