skasai5296 / actnetchallenge

Repository for the International Challenge on Activity Recognition (ActivityNet) Dense Captioning
6 stars 4 forks source link

actnetchallenge: Task 3 (Dense-Captioning Events in Videos)

Repo for activity net challenge 2019: Task 3 (Dense-Captioning Events in Videos) This repository provides a dense video captioning module for ActivityNet Captions Dataset.

TO-DO:

Requirements

How to download ActivityNet Captions Dataset (ActivityNet Videos + Annotations)

  1. Download json file for ActivityNet dataset from here
  2. Modify download.sh and fix the command line argument for root directory to save the dataset. This path will be denoted $root_path.
  3. Make sure you have at least 300GB on your storage.
  4. Run bash download.sh to download .mp4 files.
  5. Download json files for ActivityNet Captions dataset from here
  6. Extract downloaded files to $root_path
  7. Run python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/train.json -o ${save_path}
  8. Run python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/val_1.json -o ${save_path}
  9. Run python utils/add_fps_into_activitynet_json.py -v ${video_dir} -s ${root_path}/val_2.json -o ${save_path}

How to convert video files to image files

  1. Make sure you have at least 1TB and enough Inodes left on your storage.
  2. Run python utils/mp42jpg.py ${video_dir} ${root_path}/frames activitynet --n_jobs=${number_of_workers}

Training procedures

  1. Run train.py with configurations (script is in train/trainscript.sh)

Testing procedures

  1. Proposal Generation is not implemented yet, so prepare a json file with proposals.
  2. Run test.py with configurations (script is in eval/eval.sh)

Samples

Transformer Captions

Transformer Captions