Implementation of SwapMix approach to measure visual bias for visual question answering(SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering, Vipul et al., CVPR22)
We provide a new way to benchmark in a VQA model by perturbing the visual context i.e. irrelevant objects in the image.
The model looks at an image and a question. Then we change the visual context (irrelevant objects to the question) in the image. For each question we make multiple copies of image by changing context. Ideally, we would expect the model's prediction to remain consistent with context switch.
This repository contains code for measuring bias using SwapMix and training VQA models using SwapMix as data augmentation as described in the paper. Specifically, we have applied SwapMix to MCAN and LXMERT. We use GQA dataset for our analysis.
The code has been divided into MCAN and LXMERT folders. Inside each folder we provide implementation for
We restructured the format of question, answer, and scene graph files provided by GQA a bit. You can download these files along with other files needed for SwapMix implementation from here and place it at data/gqa
folder.
We recommend to use object features provided by GQA. Download the features from GQA
We provide (1) finetuned model (2) model finetuned using SwapMix as data augmentation (3) model trained with perfect sight (4) model trained with perfect sight and using SwapMix as data augmentation technique. Please download the models from here : MCAN trained models, LXMERT trained models
We measure visual bias of the model for both irrelevant object changes and attribute changes seperately.
Before benchmarking visual bias for these models, we finetune them on GQA train dataset for better performance. Models are evaluated on GQA val set.
To measure visual bias for MCAN, download the dependencies and dataset from here and then run :
cd mcan
python3 run_files/run_evaluate.py --CKPT_PATH=<path to ckpt file>
To measure context reliance after calculating object and attribute results :
cd scripts
python benchmark_frcnn.py --obj <SwapMix object json file> --attr <SwapMix attribute json file>
SwapMix can be used to measure visual bias on any VQA model.
Changes are needed on data loading and testing part. The current code iterates over each question indiviually to get predictions for all SwapMix perturbations.
Details for measuring visual bias on a new model can be found here
If you like our work and find this code useful, consider citing our work :
@inproceedings{gupta2022swapmix,
title={SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering},
author={Gupta, Vipul and Li, Zhuowan and Kortylewski, Adam and Zhang, Chenyu and Li, Yingwei and Yuille, Alan},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}