Adaptive Cross-modal Embeddings for Image-Text Alignment (ADAPT)

This code implements a novel approach for training image-text alignment models, namely ADAPT.

ADAPT is designed to adjust an intermediate representation of instances from a modality a using an embedding vector of an instance from modality b. Such an adaptation is designed to filter and enhance important information across internal features, allowing for guided vector representations – which resembles the working of attention modules, though far more computationally efficient. For further information, please read our AAAI 2020 paper.

Installation
Quick start
Training models
Pre-trained models
Citation
Poster

Installation

1. Python 3 & Anaconda

We don't provide support for python 2. We advise you to install python 3 with Anaconda and then create an environment.

2. As standalone project

conda create --name adapt python=3
conda activate adapt
git clone https://github.com/jwehrmann/retrieval.pytorch
cd retrieval.pytorch
pip install -r requirements.txt

3. Download datasets

wget https://scanproject.blob.core.windows.net/scan-data/data.zip

Quick start

Setup

Option 1:

conda activate adapt
export DATA_PATH=/path/to/dataset

Option 2:

You can also create a shell alias (shortcut to reference a command). For example, add this command to your shell profile:

alias adapt='source activate adapt && export DATA_PATH=/path/to/dataset'

And then only run the declared name of the alias to have everything configured:

$ adapt

Training Models

You can reproduce our main results using the following scripts.

Training on Flickr30k:

python run.py options/adapt/f30k/t2i.yaml
python test.py options/adapt/f30k/t2i.yaml -data_split test
python run.py options/adapt/f30k/i2t.yaml
python test.py options/adapt/f30k/i2t.yaml -data_split test

Training on MS COCO:

python run.py options/adapt/coco/t2i.yaml
python test.py options/adapt/coco/t2i.yaml -data_split test
python run.py options/adapt/coco/i2t.yaml
python test.py options/adapt/coco/i2t.yaml -data_split test

Ensembling results

To ensemble multiple models (ADAPT-Ens) one can use:

MS COCO models:

python test_ens.py options/adapt/coco/t2i.yaml options/adapt/coco/i2t.yaml -data_split test

F30k models:

python test_ens.py options/adapt/f30k/t2i.yaml options/adapt/f30k/i2t.yaml -data_split test

Pre-trained models

We make available all the main models generated in this research. Each file has the best model of the run (according to validation result), the last checkpoint generated, all tensorboard logs (loss and recall curves), result files, and configuration options used for training.

F30k models:

Dataset	Model	Image Annotation R@1	Image Retrieval R@1
F30k	ADAPT-t2i	76.4%	57.8%
F30k	ADAPT-i2t	66.3%	53.8%
F30k	ADAPT-ens	76.2%	60.5%
COCO	ADAPT-t2i	75.4%	64.0%
COCO	ADAPT-i2t	67.2%	57.8%
COCO	ADAPT-ens	75.3%	64.4%

Citation

If you find this research or code useful, please consider citing our paper:

@article{wehrmanna2020daptive,
  title={Adaptive Cross-modal Embeddings for Image-Text Alignment},
  author={Wehrmann, J{\^o}natas and Kolling, Camila and Barros, Rodrigo C},
  booktitle={The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)},
  year={2020}
}

jwehrmann / retrieval.pytorch

readme

Adaptive Cross-modal Embeddings for Image-Text Alignment (ADAPT)

Table of Contents

Installation

1. Python 3 & Anaconda

2. As standalone project

3. Download datasets

Quick start

Setup

Training Models

Ensembling results

Pre-trained models

F30k models:

Citation

Poster