Memory, Attention and Composition (MAC) Network for CLEVR/GQA from Compositional Attention Networks for Machine Reasoning (https://arxiv.org/abs/1803.03067) implemented in PyTorch
Requirements:
To train:
For GQA
cd data
mkdir gqa && cd gqa
wget https://nlp.stanford.edu/data/gqa/data1.2.zip
unzip data1.2.zip
mkdir questions
mv balanced_train_data.json questions/gqa_train_questions.json
mv balanced_val_data.json questions/gqa_val_questions.json
mv balanced_testdev_data.json questions/gqa_testdev_questions.json
cd ..
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip
wget http://nlp.stanford.edu/data/gqa/objectFeatures.zip
unzip objectFeatures.zip
cd ..
python image_feature.py data/CLEVR_v1.0
b. Preprocess questions
python preprocess.py CLEVR data/CLEVR_v1.0
For GQA
a. Merge object features (this may take some time)
python merge.py --name objects
mv data/gqa_objects.hdf5 data/gqa_features.hdf5
b. Preprocess questions
python preprocess.py gqa data/gqa
!CAUTION! the size of file created by image_feature.py is very large! You may use hdf5 compression, but it will slow down feature extraction.
python train.py gqa
CLEVR -> This implementation produces 95.75% accuracy at epoch 10, 96.5% accuracy at epoch 20.
Parts of the code borrowed from https://github.com/rosinality/mac-network-pytorch and
https://github.com/stanfordnlp/mac-network.