SotA text-only image/video method (IJCAI 2023)
Paper (Accepted by IJCAI 2023)

1. Installing

$ pip install -r requirements.txt
$ pip install git+https://github.com/openai/CLIP.git

2. Data Preparation

Downloading the images and videos of each dataset from Web.

The data files looks like:

  |   ├──./image/                   #images of the test split
  |   ├──captions_val2014.json      #annotation of test split
  |   ├──coco_test.txt              #test split of Karpathy
  |   ├──./image/                   #images in dataset
  |   ├──dataset_flickr30k.json     #annotation
  |   ├──./video/                   #images in dataset
  |   ├──./frames/                  #keyframes
  |   ├──train_val_videodatainfo.json   #annotation
  |   ├──./video/                   #images in dataset
  |   ├──./frames/                  #keyframes
  |   ├──caption.txt                #annotation
  |   ├──train_list.txt             #train split
  |   ├──test_list.txt              #train split

After preparing the data, execute the following commands to obtain the data files required to run

python data_prepare_{dataset name}.py

dataset name = {coco, flickr, msrvtt, msvd}

3. Run

Image Captioning

python run_image_captioning.py --dataset {dataset name}

dataset name = {coco, flickr}

Video Captioning

python run_video_captioning.py --dataset {dataset name}

dataset name = {msrvtt, msvd}

The default save path for checkpoints is ./checkpoint/{dataset name}, and the default save path for caption flies is ./output/{dataset name}, where the dataset name = {coco, flickr, msrvtt, msvd}

4. Evaluation

We provide the reference results and the results generated as the paper under the ./output/{dataset_name}/

For example:

python evalution.py 
--ref ./output/COCO/reference_COCO.json
--gts ./output/COCO/result_COCO.json

5. Demo

Getting the checkpoint as above operations and put them in ./checkpoint/COCO/ as:

  |   ├──decoder_coco.pth
  |   ├──map_coco.pth   

Then run the demo.ipynb