This paper introduces an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge to enhance multi-modal structured representations.
2024-02
We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo
].2023-12
Our paper: Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations was accepted by AAAI 2024
2022-12
We release the [Repo] for our AAAI 2023
paper: DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Training datasets are available here
.
There are four parts in the code.
Python 3
PyTorch >= 1.8.0
Transformers>= 4.11.3
NumPy
The training script:
bash script/run.sh
[--train_path TRAIN_PATH] [--test_path TEST_PATH] [--nepoch NEPOCH] [--batch_size BATCH_SIZE] [--manualSeed MANUAL_SEED]
[--lr LEARNING-RATE] [--weight_decay WEIGHT_DECAY] [--knowledge_weight KNOWLEDGE_WEIGHT] [--transformer_layer_num NUMBER] [--model_name MODEL_NAME] [--neg_loss_weight NEG_LOSS_WEIGHT]
Note:
.sh
file for parameter modification.Please consider citing this paper if you use the code
or data
from our work.
Thanks a lot :)
@inproceedings{DBLP:conf/aaai/StructureCLIP,
author = {Yufeng Huang and
Jiji Tang and
Zhuo Chen and
Rongsheng Zhang and
Xinfeng Zhang and
Weijie Chen and
Zeng Zhao and
Zhou Zhao and
Tangjie Lv and
Zhipeng Hu and
Wen Zhang},
title = {Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations},
booktitle = {{AAAI}},
publisher = {{AAAI} Press},
year = {2024}
}