Structure-CLIP

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations

This paper introduces an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge to enhance multi-modal structured representations.

🔔 News

2024-02 We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo].
2023-12 Our paper: Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations was accepted by AAAI 2024
2022-12 We release the [Repo] for our AAAI 2023 paper: DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

🌈 Model Architecture

📚 Dataset Download

Training datasets are available here .

📕 Code Path

Code Structures

There are four parts in the code.

model: It contains the main files for Structure-CLIP network.
data: It contains the pre-training data splits and downstream dataset.
checkpoints: It saves checkpoint for reloading.
script: The training scripts for Structure-CLIP.

🔬 Dependencies

Python 3
PyTorch >= 1.8.0
Transformers>= 4.11.3
NumPy
All experiments are performed with one A100 GPU.

🚀 Train & Eval

The training script:

bash script/run.sh

Parameter

[--train_path TRAIN_PATH] [--test_path TEST_PATH] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--lr LEARNING-RATE] [--weight_decay WEIGHT_DECAY] [--knowledge_weight KNOWLEDGE_WEIGHT] [--transformer_layer_num NUMBER] [--model_name MODEL_NAME] [--neg_loss_weight NEG_LOSS_WEIGHT]

Note:

you can open the .sh file for parameter modification.

🤝 Cite:

Please consider citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/aaai/StructureCLIP,
  author       = {Yufeng Huang and
                  Jiji Tang and
                  Zhuo Chen and
                  Rongsheng Zhang and
                  Xinfeng Zhang and
                  Weijie Chen and
                  Zeng Zhao and
                  Zhou Zhao and
                  Tangjie Lv and
                  Zhipeng Hu and
                  Wen Zhang},
  title        = {Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations},
  booktitle    = {{AAAI}},
  publisher    = {{AAAI} Press},
  year         = {2024}
}

zjukg / Structure-CLIP

readme

Structure-CLIP

🔔 News

🌈 Model Architecture

📚 Dataset Download

📕 Code Path

Code Structures

🔬 Dependencies

🚀 Train & Eval

Parameter

🤝 Cite: