PROB: Probabilistic Objectness for Open World Object Detection (CVPR 2023)
paper
arXiv
website
video
If you like our project, please give us a star ⭐ on GitHub for latest updates!
## 📰 News
* **[2024.01.05]** ⏭️ [Check out my new OWOD paper](https://github.com/orrzohar/FOMO), where I attempt to integrate foundation models into the OWOD objective!
* **[2023.06.18]** 🤝 Presenting at CVPR - come check out our poster, and discuss the future of OWOD.
* **[2023.02.27]** 🚀 PROB was accepted to CVPR 2023!
* **[2022.12.02]** First published on [arXiv](https://arxiv.org/abs/2212.01424).
Certainly! Here's a more concise version of your "Highlights" section:
## 🔥 Highlights
* **Open World Object Detection (OWOD):** A new computer vision task that extends traditional object detection to include both seen and unknown objects, aligning more with real-world scenarios.
* **Challenges with Standard OD:** Traditional methods inadequately classify unknown objects as background, failing in OWOD contexts.
* **Novel Probabilistic Framework:** Introduces a method for estimating objectness in embedded feature space, enhancing the identification of unknown objects.
* **PROB: A Transformer-Based Detector:** A new model that adapts existing OD models for OWOD, significantly improving unknown object detection.
* **Superior Performance:** PROB outperforms existing OWOD methods, doubling the recall for unknown objects and increasing known object detection mAP by 10%.
![prob](./docs/overview.png)
## Overview
PROB adapts the Deformable DETR model by adding the proposed 'probabilistic objectness' head. In training, we alternate
between distribution estimation (top right) and objectness likelihood maximization of **matched ground-truth objects**
(top left). For inference, the objectness probability multiplies the classification probabilities. For more, see the manuscript.
![prob](./docs/Method.png)
## 📊 Results
|
Task1 |
Task2 |
Task3 |
Task4 |
Method |
U-Recall |
mAP |
U-Recall |
mAP |
U-Recall |
mAP |
mAP |
OW-DETR |
7.5 |
59.2 |
6.2 |
42.9 |
5.7 |
30.8 |
27.8 |
PROB |
19.4 |
59.5 |
17.4 |
44.0 |
19.6 |
36.0 |
31.5 |
## 🛠️ Requirements and Installation
### Python Environment
We have trained and tested our models on `Ubuntu 16.04`, `CUDA 11.1/11.3`, `GCC 5.4.0`, `Python 3.10.4`
```bash
conda create --name prob python==3.10.4
conda activate prob
pip install -r requirements.txt
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
```
### Backbone features
Download the self-supervised backbone from [here](https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth) and add in `models` folder.
### Compiling CUDA operators
```bash
cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py
```
## Dataset Preparation
The file structure:
```
PROB/
└── data/
└── OWOD/
├── JPEGImages
├── Annotations
└── ImageSets
├── OWDETR
├── TOWOD
└── VOC2007
```
The splits are present inside `data/OWOD/ImageSets/` folder.
1. Download the COCO Images and Annotations from [coco dataset](https://cocodataset.org/#download) into the `data/` directory.
2. Unzip train2017 and val2017 folder. The current directory structure should look like:
```
PROB/
└── data/
└── coco/
├── annotations/
├── train2017/
└── val2017/
```
4. Move all images from `train2017/` and `val2017/` to `JPEGImages` folder.
5. Use the code `coco2voc.py` for converting json annotations to xml files.
6. Download the PASCAL VOC 2007 & 2012 Images and Annotations from [pascal dataset](http://host.robots.ox.ac.uk/pascal/VOC/) into the `data/` directory.
7. untar the trainval 2007 and 2012 and test 2007 folders.
8. Move all the images to `JPEGImages` folder and annotations to `Annotations` folder.
Currently, we follow the VOC format for data loading and evaluation
## 🤖 Training
#### Training on single node
To train PROB on a single node with 4 GPUS, run
```bash
bash ./run.sh
```
**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running `chmod +x *.sh` in each directory.
By editing the run.sh file, you can decide to run each one of the configurations defined in ``\configs``:
1. EVAL_M_OWOD_BENCHMARK.sh - evaluation of tasks 1-4 on the MOWOD Benchmark.
2. EVAL_S_OWOD_BENCHMARK.sh - evaluation of tasks 1-4 on the SOWOD Benchmark.
3. M_OWOD_BENCHMARK.sh - training for tasks 1-4 on the MOWOD Benchmark.
4. M_OWOD_BENCHMARK_RANDOM_IL.sh - training for tasks 1-4 on the MOWOD Benchmark with random exemplar selection.
5. S_OWOD_BENCHMARK.sh - training for tasks 1-4 on the SOWOD Benchmark.
#### Training on slurm cluster
To train PROB on a slurm cluster having 2 nodes with 8 GPUS each (not tested), run
```bash
bash run_slurm.sh
```
**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running `chmod +x *.sh` in each directory.
### Hyperparameters for different systems
System |
Hyper Parameters |
Notes |
Verified By |
2, 4, 8, 16 A100 (40G) |
-
|
- |
orrzohar |
2 A100 (80G) |
lr_drop = 30
|
lower lr_drop required to sustain U-Recall |
https://github.com/orrzohar/PROB/issues/47 |
4 Titan RTX (24G) |
lr_drop = 40, batch_size = 2
|
class_error drops more slowly during training. |
https://github.com/orrzohar/PROB/issues/26 |
4 3090 (24G) |
lr_drop = 35, batch_size = 2
lr = 1e-4, lr_drop=35, batch_size = 3
|
Performance drops to K_AP50= 58.338, U_R50=19.443. |
https://github.com/orrzohar/PROB/issues/48 |
1 2080Ti(11G) |
lr = 2e-5, lr_backbone = 4e-6, batch size = 1, obj_temp = 1.3
|
Performance drops to K_AP50=57.9826 U_R50=19.2624. |
https://github.com/orrzohar/PROB/issues/50 |
## 📈 Evaluation
For reproducing any of the aforementioned results, please download our [weights](https://drive.google.com/uc?id=1TbSbpeWxRp1SGcp660n-35sd8F8xVBSq) and place them in the
'exps' directory. Run the `run_eval.sh` file to utilize multiple GPUs.
**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running `chmod +x *.sh` in each directory.
```
PROB/
└── exps/
├── MOWODB/
| └── PROB/ (t1.ph - t4.ph)
└── SOWODB/
└── PROB/ (t1.ph - t4.ph)
```
**Note:**
Please check the [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR) repository for more training and evaluation details.
## ✏️ Citation
If you use PROB, please consider citing:
```bibtex
@InProceedings{Zohar_2023_CVPR,
author = {Zohar, Orr and Wang, Kuan-Chieh and Yeung, Serena},
title = {PROB: Probabilistic Objectness for Open World Object Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {11444-11453}
}
```
## 📧 Contact
Should you have any questions, please contact :e-mail: orrzohar@stanford.edu
## 👍 Acknowledgements
PROB builds on previous works' code bases such as [OW-DETR](https://github.com/akshitac8/OW-DETR), [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR), [Detreg](https://github.com/amirbar/DETReg), and [OWOD](https://github.com/JosephKJ/OWOD). If you found PROB useful please consider citing these works as well.
## ✨ Star History
[![Star History Chart](https://api.star-history.com/svg?repos=orrzohar/PROB&type=Date)](https://star-history.com/#orrzohar/PROB&Date)