This is the official PyTorch implementation of CORA (CVPR 2023).
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023)
Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li
We propose CORA, a DETR-style framework for open-vocabulary detection (OVD) that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching. Our method demonstrates state-of-the-art results on both COCO and LVIS OVD benchmarks.
# environment
conda create -n cora python=3.9.12
conda activate cora
conda install pytorch==1.12.0 torchvision==0.13.0 cudatoolkit=11.3 -c pytorch
# cora
git clone git@github.com:tgxs002/CORA.git
cd CORA
# other dependencies
pip install -r requirements.txt
# install detectron2
Please install detectron2 as instructed in the official tutorial (https://detectron2.readthedocs.io/en/latest/tutorials/install.html). We use version==0.6 in our experiments.
Check docs/dataset.md
for dataset preparation.
Besides the dataset, we also provide necessary files to reproduce our result. Please download the learned region prompts, and put them under logs
folder. A guide for training the region prompts is provided in Region Prompting.
Method | Pretraining Model | Novel | All | Checkpoint |
---|---|---|---|---|
CORA | RN50 | 35.1 | 35.4 | Checkpoint |
CORA | RN50x4 | 41.7 | 43.8 | Checkpoint |
Checkpoints for LVIS, $\text{CORA}^+$ will be ready soon.
Run the following command for evaluating the RN50 model:
# if you are running locally
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh test 8 local --resume /path/to/checkpoint.pth --eval
# if you are running on a cluster with slurm scheduler
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh test 8 slurm quota_type partition_name --resume /path/to/checkpoint.pth --eval
If you are using slurm, please remember to replace quota_type and partition_name to your quota type and the partition you are using. You can directly change the config and checkpoint path to evaluate other models.
Before training the localizer, please make sure that the region prompts and relabeled annotations as instructed in Data Preparation.
Run the following command to train the RN50 model:
# if you are running locally
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh RN50 8 local
# if you are running on a cluster with slurm scheduler
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh RN50 8 slurm quota_type partition_name
If you are using slurm, please remember to replace quota_type and partition_name to your quota type and the partition you are using. You can directly change the config to train other models.
We provide the trained pre-trained region prompts as specified in Data Preparation. Please refer to the region branch for training and exporting the region prompts.
git checkout region
The code for CLIP-Aligned Labeling will be released soon in another branch of this repository, we provide the pre-computed relabeled annotations as specified in Data Preparation.
If you find this repo useful, please consider citing our paper:
@article{wu2023cora,
title={CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching},
author={Xiaoshi Wu and Feng Zhu and Rui Zhao and Hongsheng Li},
journal={ArXiv},
year={2023},
volume={abs/2303.13076}
}
This repository was built on top of SAM-DETR, CLIP, RegionClip, and DAB-DETR. We thank the effort from the community.