Scene Graph Generation from Natural Language Supervision

This is our Pytorch implementation for the paper:

Xingchen Li, Long Chen, Wenbo Ma, Yi Yang, and Jun Xiao. Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation. In MM 2022.

Installation
Data
Metrics
Pretrained Object Detector
Grounding Module
Pretrained Scene Graph Generation Models
Model Training
Model Evaluation
Acknowledgement
Reference

Installation

Check INSTALL.md for installation instructions.

Data

Check DATASET.md for instructions of data downloading.

Metrics

Explanation of metrics in this toolkit are given in METRICS.md

Pretrained Object Detector

In this project, we primarily use the detector Faster RCNN pretrained on Open Images dataset. To use this repo, you don't need to run this detector. You can directly download the extracted detection features, as the instruction in DATASET.md. If you're interested in this detector, the pretrained model can be found in TensorFlow 1 Detection Model Zoo: faster_rcnn_inception_resnet_v2_atrous_oidv4.

For fully supervised models, you can use the detector pretrained by Scene-Graph-Benchmark. You can download this Faster R-CNN model and extract all the files to the directory checkpoints/pretrained_faster_rcnn.

Grounding Module

The codes for grounding module can be found here. We provide generated grounding results.

Pretrained Scene Graph Generation Models

Our pretrained SGG models can be downloaded on Google Drive. The details of these models can be found in Model Training section below. After downloading, please put all the folders to the directory checkpoints/.

Model Training

To train our scene graph generation models, run the script

bash train.sh MODEL_TYPE

where MODEL_TYPE specifies the training supervision, the training dataset and the scene graph generation model. See details below.

VG caption supervised models: trained by image-text pairs in VG dataset
- VG_Caption_Ground_*: train a SGG model with the generated pseudo labels by our methods. * represents the model name and can be Motifs, Uniter.
- VG_Caption_SGNLS_*: train a SGG model with generated pseudo labels from detector. * represents the model name and can be Motifs, Uniter.
VG unlocal supervised models: trained by unlocalized scene graph labels
- Unlocal_VG_Ground_*: train a SGG model with the generated pseudo labels by our methods.
- Unlocal_VG_SGNLS_*: train a SGG model with the generated pseudo labels from detector.

You can set CUDA_VISIBLE_DEVICES in train.sh to specify which GPUs are used for model training (e.g., the default script uses 2 GPUs).

Model Evaluation

To evaluate the trained scene graph generation model, you can reuse the commands in train.sh by simply changing WSVL.SKIP_TRAIN to True and setting OUTPUT_DIR as the path to your trained model.

Acknowledgement

This repository was built based on SGG_from_NLS, Scene-Graph-Benchmark for scene graph generation and UNITER for image-text representation learning.

Reference

If you find this project helps your research, please kindly consider citing our project or papers in your publications.

xcppy / WS-SGG

readme