This is our Pytorch implementation for the paper:
Xingchen Li, Long Chen, Wenbo Ma, Yi Yang, and Jun Xiao. Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation. In MM 2022.
Check INSTALL.md for installation instructions.
Check DATASET.md for instructions of data downloading.
Explanation of metrics in this toolkit are given in METRICS.md
In this project, we primarily use the detector Faster RCNN pretrained on Open Images dataset. To use this repo, you don't need to run this detector. You can directly download the extracted detection features, as the instruction in DATASET.md. If you're interested in this detector, the pretrained model can be found in TensorFlow 1 Detection Model Zoo: faster_rcnn_inception_resnet_v2_atrous_oidv4.
For fully supervised models, you can use the detector pretrained by Scene-Graph-Benchmark. You can download this Faster R-CNN model and extract all the files to the directory checkpoints/pretrained_faster_rcnn
.
The codes for grounding module can be found here. We provide generated grounding results.
Our pretrained SGG models can be downloaded on Google Drive. The details of these models can be found in Model Training section below. After downloading, please put all the folders to the directory checkpoints/
.
To train our scene graph generation models, run the script
bash train.sh MODEL_TYPE
where MODEL_TYPE
specifies the training supervision, the training dataset and the scene graph generation model. See details below.
VG caption supervised models: trained by image-text pairs in VG dataset
VG_Caption_Ground_*
: train a SGG model with the generated pseudo labels by our methods. *
represents the model name and can be Motifs
, Uniter
.VG_Caption_SGNLS_*
: train a SGG model with generated pseudo labels from detector. *
represents the model name and can be Motifs
, Uniter
.VG unlocal supervised models: trained by unlocalized scene graph labels
Unlocal_VG_Ground_*
: train a SGG model with the generated pseudo labels by our methods.Unlocal_VG_SGNLS_*
: train a SGG model with the generated pseudo labels from detector.You can set CUDA_VISIBLE_DEVICES
in train.sh
to specify which GPUs are used for model training (e.g., the default script uses 2 GPUs).
To evaluate the trained scene graph generation model, you can reuse the commands in train.sh
by simply changing WSVL.SKIP_TRAIN
to True
and setting OUTPUT_DIR
as the path to your trained model.
This repository was built based on SGG_from_NLS, Scene-Graph-Benchmark for scene graph generation and UNITER for image-text representation learning.
If you find this project helps your research, please kindly consider citing our project or papers in your publications.