PyTorch implementation of the following models:
This implementation uses Python 3, PyTorch and pycocoevalcap (https://github.com/salaniz/pycocoevalcap).
All dependencies can be installed into a conda environment with the provided environment.yml file.
git clone https://github.com/salaniz/pytorch-gve-lrcn.git
cd pytorch-gve-lrcn
conda env create -f environment.yml
conda activate gve-lrcn
coco-data-setup-linux.sh
downloads COCO 2014: http://cocodataset.org/ [3]cub-data-setup-linux.sh
downloads preprocessed features of CUBS-200-2011: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html [4]Train LRCN on COCO
python main.py --model lrcn --dataset coco
Train GVE on CUB
python main.py --model sc --dataset cub
cp ./checkpoints/sc-cub-D<date>-T<time>-G<GPUid>/best-ckpt.pth ./data/cub/sentence_classifier_ckpt.pth
python main.py --model gve --dataset cub --sc-ckpt ./data/cub/sentence_classifier_ckpt.pth
Evaluation
<model>-<dataset>-D<date>-T<time>-G<GPUid>
python main.py --model gve --dataset cub --eval ./checkpoints/gve-cub-D<date>-T<time>-G<GPUid>/best-ckpt.pth
Note: Since COCO does not come with test set annotations, this script evaluates on the validation set when run on the COCO dataset
data_path ./data
checkpoint_path ./checkpoints
log_step 10
num_workers 4
disable_cuda False
cuda_device 0
torch_seed <random>
model lrcn
dataset coco
pretrained_model vgg16
layers_to_truncate 1
sc_ckpt ./data/cub/sentence_classifier_ckpt.pth
weights_ckpt None
loss_lambda 0.2
embedding_size 1000
hidden_size 1000
num_epochs 50
batch_size 128
learning_rate 0.001
train True
eval_ckpt None
$ python main.py --help
usage: main.py [-h] [--data-path DATA_PATH]
[--checkpoint-path CHECKPOINT_PATH] [--log-step LOG_STEP]
[--num-workers NUM_WORKERS] [--disable-cuda]
[--cuda-device CUDA_DEVICE] [--torch-seed TORCH_SEED]
[--model {lrcn,gve,sc}] [--dataset {coco,cub}]
[--pretrained-model {resnet18,resnet34,resnet50,resnet101,resnet152,vgg11,vgg11_bn,vgg13,vgg13_bn,vgg16,vgg16_bn,vgg19_bn,vgg19}]
[--layers-to-truncate LAYERS_TO_TRUNCATE] [--sc-ckpt SC_CKPT]
[--weights-ckpt WEIGHTS_CKPT] [--loss-lambda LOSS_LAMBDA]
[--embedding-size EMBEDDING_SIZE] [--hidden-size HIDDEN_SIZE]
[--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE]
[--learning-rate LEARNING_RATE] [--eval EVAL]
optional arguments:
-h, --help show this help message and exit
--data-path DATA_PATH
root path of all data
--checkpoint-path CHECKPOINT_PATH
path checkpoints are stored or loaded
--log-step LOG_STEP step size for prining logging information
--num-workers NUM_WORKERS
number of threads used by data loader
--disable-cuda disable the use of CUDA
--cuda-device CUDA_DEVICE
specify which GPU to use
--torch-seed TORCH_SEED
set a torch seed
--model {lrcn,gve,sc}
deep learning model
--dataset {coco,cub}
--pretrained-model {resnet18,resnet34,resnet50,resnet101,resnet152,vgg11,vgg11_bn,vgg13,vgg13_bn,vgg16,vgg16_bn,vgg19_bn,vgg19}
[LRCN] name of pretrained model for image features
--layers-to-truncate LAYERS_TO_TRUNCATE
[LRCN] number of final FC layers to be removed from
pretrained model
--sc-ckpt SC_CKPT [GVE] path to checkpoint for pretrained sentence
classifier
--weights-ckpt WEIGHTS_CKPT
[GVE] path to checkpoint for pretrained weights
--loss-lambda LOSS_LAMBDA
[GVE] weight factor for reinforce loss
--embedding-size EMBEDDING_SIZE
dimension of the word embedding
--hidden-size HIDDEN_SIZE
dimension of hidden layers
--num-epochs NUM_EPOCHS
--batch-size BATCH_SIZE
--learning-rate LEARNING_RATE
--eval EVAL path of checkpoint to be evaluated