This is the code for our paper titled "Iterative Scene Graph Generation".
The following packages are needed to run the code.
python == 3.8.5
PyTorch == 1.8.2
detectron2 == 0.6
h5py
imantics
easydict
cv2 == 4.5.5
scikit-learn
scipy
pandas
We use the Visual Genome filtered data widely used in the Scene Graph community. Please see the public repository of the paper Unbiased Scene Graph Generation repository on instructions to download this dataset. After downloading the dataset you should have the following 4 files:
VG_100K
directory containing all the imagesVG-SGG-with-attri.h5
VG-SGG-dicts-with-attri.json
(Can be found in the same repository here)image_data.json
(Can be found in the same repository here)To enable faster model convergence, we pre-train DETR on Visual Genome. We replicate the DETR decoder weights three times, and initialize our models three decoders with it. For convenience, the pretrained weights (with the decoder replication) are made available here. To use these weights during training, simply use the MODEL.WEIGHTS <Path to downloaded checkpoint>
flag in the training command.
Our proposed iterative model can be trained using the following command:
python train_iterative_model.py --resume --num-gpus <NUM_GPUS> --config-file configs/iterative_model.yaml OUTPUT_DIR <PATH TO CHECKPOINT DIR> DATASETS.VISUAL_GENOME.IMAGES <PATH TO VG_100K IMAGES> DATASETS.VISUAL_GENOME.MAPPING_DICTIONARY <PATH TO VG-SGG-dicts-with-attri.json> DATASETS.VISUAL_GENOME.IMAGE_DATA <PATH TO image_data.json> DATASETS.VISUAL_GENOME.VG_ATTRIBUTE_H5 <PATH TO VG-SGG-with-attri.h5> MODEL.DETR.OVERSAMPLE_PARAM <Alpha Value> MODEL.DETR.UNDERSAMPLE_PARAM <Twice the Beta Value> SOLVER.CLIP_GRADIENTS.CLIP_VALUE 0.01 SOLVER.IMS_PER_BATCH 12 MODEL.DETR.NO_OBJECT_WEIGHT 0.1 MODEL.WEIGHTS <PATH TO DETR Pretrained Model>
To set the α
value use MODEL.DETR.OVERSAMPLE_PARAM
flag, and set the β
value using the MODEL.DETR.UNDERSAMPLE_PARAM
. Note that MODEL.DETR.UNDERSAMPLE_PARAM
should be specified as twice the desired β value. So for β=0.75
use MODEL.DETR.UNDERSAMPLE_PARAM 1.5
.
Note: If the code fails, try running it on a single GPU first in order to allow some preprocessed files to be generated. This is a one-time step. Once the code runs succesfully on a single GPU, you can run it on multiple GPUs as well. Additionally, the code, by default, is configured to run on 4 GPUs with a batch size of 12. If you run out of memory, change the batch size by using the flag SOLVER.IMS_PER_BATCH <NUM IMAGES IN BATCH>
.
To evaluate the code, use the following command:
python train_iterative_model.py --resume --eval-only --num-gpus <NUM_GPUS> --config-file configs/iterative_model.yaml OUTPUT_DIR <PATH TO CHECKPOINT DIR> DATASETS.VISUAL_GENOME.IMAGES <PATH TO VG_100K IMAGES> DATASETS.VISUAL_GENOME.MAPPING_DICTIONARY <PATH TO VG-SGG-dicts-with-attri.json> DATASETS.VISUAL_GENOME.IMAGE_DATA <PATH TO image_data.json> DATASETS.VISUAL_GENOME.VG_ATTRIBUTE_H5 <PATH TO VG-SGG-with-attri.h5>
You can find our model weights for α=0.07
and β=0.75
here. To use these weights during evaluation, simply use the MODEL.WEIGHTS <Path to downloaded checkpoint>
flag in the evaluation command. To check if the code is running correctly on your machine, the released checkpoint should give you the following metrics on the Visual Genome test set VG_test
.
SGG eval: R @ 20: 0.2179; R @ 50: 0.2712; R @ 100: 0.2972; for mode=sgdet, type=Recall(Main).
SGG eval: ng-R @ 20: 0.2272; ng-R @ 50: 0.3052; ng-R @ 100: 0.3547; for mode=sgdet, type=No Graph Constraint Recall(Main).
SGG eval: zR @ 20: 0.0134; zR @ 50: 0.0274; zR @ 100: 0.0384; for mode=sgdet, type=Zero Shot Recall.
SGG eval: mR @ 20: 0.1115; mR @ 50: 0.1561; mR @ 100: 0.1770; for mode=sgdet, type=Mean Recall.