This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering". It currently includes the code for training TRAR on VQA2.0 and CLEVR dataset. Our TRAR model for REC task is coming soon.
CLEVR
TRAR model on train
split: TRAR CLEVR Pretrained Models.train+val
split and train+val+vg
split: VQA-v2 TRAR Pretrained Models train2014
, val2014
and test2015
data. Please check our dataset setup page DATA.md for more details.train
split. Please check our model page MODEL.md for more details.TRAR vs Standard Transformer
TRAR Overall
Clone this repo
git clone https://github.com/rentainhe/TRAR-VQA.git
cd TRAR-VQA
Create a conda virtual environment and activate it
conda create -n trar python=3.7 -y
conda activate trar
Install CUDA==10.1
with cudnn7
following the official installation instructions
Install Pytorch==1.7.1
and torchvision==0.8.2
with CUDA==10.1
:
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
Install Spacy and initialize the GloVe as follows:
pip install -r requirements.txt
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz
see DATA.md
In trar.yml config we have these specific settings for TRAR
model
ORDERS: [0, 1, 2, 3]
IMG_SCALE: 8
ROUTING: 'hard' # {'soft', 'hard'}
POOLING: 'attention' # {'attention', 'avg', 'fc'}
TAU_POLICY: 1 # {0: 'SLOW', 1: 'FAST', 2: 'FINETUNE'}
TAU_MAX: 10
TAU_MIN: 0.1
BINARIZE: False
ORDERS=list
, to set the local attention window size for routing.0
for global attention.IMG_SCALE=int
, which should be equal to the image feature size
used for training. You should set IMG_SCALE: 16
for 16 × 16
training features.ROUTING={'hard', 'soft'}
, to set the Routing Block Type
in TRAR model.POOLING={'attention', 'avg', 'fc}
, to set the Downsample Strategy
used in Routing Block
.TAU_POLICY={0, 1, 2}
, to set the temperature schedule
in training TRAR when using ROUTING: 'hard'
.TAU_MAX=float
, to set the maximum temperature in training.TAU_MIN=float
, to set the minimum temperature in training.BINARIZE=bool
, binarize the predicted alphas (alphas: the prob of choosing one path), which means during test time, we only keep the maximum alpha and set others to zero. If BINARIZE=False
, it will keep all of the alphas and get a weight sum of different routing predict result by alphas. It won't influence the training time, just a small difference during test time.Note that please set BINARIZE=False
when ROUTING='soft'
, it's no need to binarize the path prob in soft routing block.
TAU_POLICY
visualization
For MAX_EPOCH=13
with WARMUP_EPOCH=3
we have the following policy strategy:
Train model on VQA-v2 with default hyperparameters:
python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar'
and the training log will be seved to:
results/log/log_run_<VERSION>.txt
Args:
--DATASET={'vqa', 'clevr'}
to choose the task for training--GPU=str
, e.g. --GPU='2'
to train model on specific GPU device.--SPLIT={'train', 'train+val', train+val+vg'}
, which combines different training datasets. The default training split is train
.--MAX_EPOCH=int
to set the total training epoch number.Resume Training
Resume training from specific saved model weights
python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar' --RESUME=True --CKPT_V=str --CKPT_E=int
--CKPT_V=str
: the specific checkpoint version--CKPT_E=int
: the resumed epoch numberMulti-GPU Training and Gradient Accumulation
Multi-GPU Training:
Add --GPU='0, 1, 2, 3...'
after the training scripts.
python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar' --GPU='0,1,2,3'
The batch size on each GPU will be divided into BATCH_SIZE/GPUs
automatically.
Gradient Accumulation:
Add --ACCU=n
after the training scripts
python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar' --ACCU=2
This makes the optimizer accumulate gradients for n
mini-batches and update the model weights once. BATCH_SIZE
should be divided by n
.
Warning: The args --MODEL
and --DATASET
should be set to the same values as those in the training stage.
Validate on Local Machine
Offline evaluation only support the evaluations on the coco_2014_val
dataset now.
Use saved checkpoint
python3 run.py --RUN='val' --MODEL='trar' --DATASET='{vqa, clevr}' --CKPT_V=str --CKPT_E=int
Use the absolute path
python3 run.py --RUN='val' --MODEL='trar' --DATASET='{vqa, clevr}' --CKPT_PATH=str
Online Testing
All the evaluations on the test
dataset of VQA-v2 and CLEVR benchmarks can be achieved as follows:
python3 run.py --RUN='test' --MODEL='trar' --DATASET='{vqa, clevr}' --CKPT_V=str --CKPT_E=int
Result file are saved at:
results/result_test/result_run_<CKPT_V>_<CKPT_E>.json
You can upload the obtained result json file to Eval AI to evaluate the scores.
Here we provide our pretrained model and log, please see MODEL.md
if TRAR is helpful for your research or you wish to refer the baseline results published here, we'd really appreciate it if you could cite this paper:
@InProceedings{Zhou_2021_ICCV,
author = {Zhou, Yiyi and Ren, Tianhe and Zhu, Chaoyang and Sun, Xiaoshuai and Liu, Jianzhuang and Ding, Xinghao and Xu, Mingliang and Ji, Rongrong},
title = {TRAR: Routing the Attention Spans in Transformer for Visual Question Answering},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {2074-2084}
}