SwapMix

Implementation of SwapMix approach to measure visual bias for visual question answering(SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering, Vipul et al., CVPR22)

Motivation_new

Introduction

We provide a new way to benchmark in a VQA model by perturbing the visual context i.e. irrelevant objects in the image.

The model looks at an image and a question. Then we change the visual context (irrelevant objects to the question) in the image. For each question we make multiple copies of image by changing context. Ideally, we would expect the model's prediction to remain consistent with context switch.

This repository contains code for measuring bias using SwapMix and training VQA models using SwapMix as data augmentation as described in the paper. Specifically, we have applied SwapMix to MCAN and LXMERT. We use GQA dataset for our analysis.

Implementation Details

The code has been divided into MCAN and LXMERT folders. Inside each folder we provide implementation for

Measuring visual bias using SwapMix
Finetuning models using SwapMix as data augmentation technique
Training model with perfect sight.

Download Dataset

We restructured the format of question, answer, and scene graph files provided by GQA a bit. You can download these files along with other files needed for SwapMix implementation from here and place it at data/gqa folder.

We recommend to use object features provided by GQA. Download the features from GQA

Download pretrained models

We provide (1) finetuned model (2) model finetuned using SwapMix as data augmentation (3) model trained with perfect sight (4) model trained with perfect sight and using SwapMix as data augmentation technique. Please download the models from here : MCAN trained models, LXMERT trained models

Evaluation

We measure visual bias of the model for both irrelevant object changes and attribute changes seperately.

Before benchmarking visual bias for these models, we finetune them on GQA train dataset for better performance. Models are evaluated on GQA val set.

To measure visual bias for MCAN, download the dependencies and dataset from here and then run :

cd mcan
python3 run_files/run_evaluate.py --CKPT_PATH=<path to ckpt file>

To measure context reliance after calculating object and attribute results :

cd scripts
python benchmark_frcnn.py --obj <SwapMix object json file>   --attr <SwapMix attribute json file>

Evaluating new model for visual bias

SwapMix can be used to measure visual bias on any VQA model.

Changes are needed on data loading and testing part. The current code iterates over each question indiviually to get predictions for all SwapMix perturbations.

Details for measuring visual bias on a new model can be found here

Citation

If you like our work and find this code useful, consider citing our work :

@inproceedings{gupta2022swapmix,
    title={SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering},
    author={Gupta, Vipul and Li, Zhuowan and Kortylewski, Adam and Zhang, Chenyu and Li, Yingwei and Yuille, Alan},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

References

Deep Modular Co-Attention Networks for Visual Question Answering, Zhou et al., CVPR 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers, Hao et al., EMNLP 2019
VQA : Visual Question Answering, Antol et al., ICCV15

vipulgupta1011 / swapmix

readme