Official implementation of MarvelOVD in ECCV 2024.
Our project is developed on Detectron2. Please follow the official installation instructions.
Download the COCO dataset, and put it in the datasets/
directory.
Download VL-PLM pre-generated pseudo-labeled data and our generated candidate pseudo-label data, and put them in the datasets/open_voc
directory.
Dataset are organized in the following way:
datasets/
coco/
annotations/
instances_train2017.json
instances_val2017.json
open_voc/
instances_eval.json
instances_train.json
images/
train2017/
000000000009.jpg
000000000025.jpg
...
val2017/
000000000776.jpg
000000000139.jpg
...
MarvelOVD dynamically learns open-vocabulary knowledge from offline-generated pseudo-labels under the guidance from the online training detector.
If necessary, please refer to pseudo label generation instruction to generate offline pseudo-labels.
Mask R-CNN:
Novel AP | Base AP | Overall AP |
---|---|---|
38.9 | 56.4 | 51.8 |
We train the model under regular data augmentations (no Large Scale Jittering), without extra GPU memory occupation. (Runing on 4 GPUs with 24G Memory per GPU)
Training command
python train_net.py --config configs/coco_ssod.yaml --num-gpus=4
The code is highly borrowed from VL_PLM, big thanks for the open-source commuity. Questions and Issues, please contract wangk229@mail2.sysu.edu.cn