s1879281 / Image-Captioning-with-Adaptive-Attention

PyTorch implementation of image captioning with adaptive attention mechanism.
18 stars 7 forks source link

Image-Captioning-with-Adaptive-Attention

This is a PyTorch implementation of Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning.

example

Requirements

This code is all written in Python. You will need a GPU to train the model.

You also need to install the following package in order to sucessfully run the code.

Dataset

You can feel free to choose MSCOCO, Flicker8k or Flicker30k as your dataset.

You might want to use the following command to download the MSCOCO dataset:

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip

We will use Andrej Karpathy's training, validation, and test splits. To download the zip file, you can use the following command:

wget http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip

Data preprocess

In order to preprocess the data on the MSCOCO dataset, you can use the following command:

mkdir coco_folder
python create_input_files.py -d coco -i [YOUR-IMAGE-FLODER]

Training

Use the following command to training the model on MSCOCO dataset:

python train.py -d coco

For comparison, you may also want to train the model with soft attention (paper):

python train.py -d coco -a

Evaluation

You can feel free to choose different beam sizes during evaluation. Use the following command to compute all BLEU (i.e. BLEU-1 to BLEU-4) scores:

python eval.py -d coco -cf [PATH-TO-CHECKPOINT] -b 5

Note that the best checkpoint in training process is based on the BLEU-4 score.

Captioning

For captioning on your own image, you can use the following command:

python caption.py

Reference

If you use this code as part of any published research, please acknowledge the following paper:

@misc{Lu2017Adaptive,
author = {Lu, Jiasen and Xiong, Caiming and Parikh, Devi and Socher, Richard},
title = {Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning},
journal = {CVPR},
year = {2017}
}

Acknowledgement

The code is developed based on a-PyTorch-Tutorial-to-Image-Captioning.