We build a GQA-Sub dataset to enable the quantitative evaluation of reasoning consistency in compositional VQA. The GQA-Sub dataset is constructed based on the GQA dataset, a large-scale dataset for real-world visual reasoning and compositional question answering. We only generate sub-questions for questions of the train split and the validation split of GQA because the ground-truth scene graphs of the two splits are available. Thus our GQA-Sub dataset contains a train-sub split and a validation-sub split. The two splits can be found in the folder "questions".
We propose a dialog-like reasoning method that integrates the reasoning processes for sub-questions into the reasoning process for a compositional question to maintain the reasoning consistency in compositional VQA. The folder "DLR" contains the source code of the proposed method.
Please download all the question files from here and the visual features from here.
cd dir/
python exp/main.py exp_id 001 dialog True TRAIN.SPLIT_VQA train_dialog_balanced
python exp/main.py train False TEST.EVAL_ID 001 TEST.EPOCH 10 TEST.DUMP_PRED True TEST.SPLIT_VQA val_sub_balanced
python exp/main.py train False TEST.EVAL_ID 001 TEST.EPOCH 10 TEST.DUMP_PRED True TEST.SPLIT_VQA val_balanced
python util/compute_consistency.py --exp_name 001_DLR --epoch 10
If you find the dataset or code useful, please consider citing our paper:
@inproceedings{jing2022maintaining,
title={Maintaining Reasoning Consistency in Compositional Visual Question Answering},
author={Jing, Chenchen and Jia, Yunde and Wu, Yuwei and Liu, Xinyu and Wu, Qi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={5099-5108},
year={2022}
}
The implementation of the dialog-like reasoning method is partly based on the following codebases, LCGN, MMN, and Logic-guided QA. We gratefully thank the authors for their wonderful works.