Salience DETR

GitHub stars GitHub forks

This repository is an official implementation of the Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement accepeted to CVPR 2024 (score 553). Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen.

💖 If our Salience-DETR is helpful to your researches or projects, please star this repository. Thanks! 🤗

✨Highlights

1. We offer a deepened analysis for [scale bias and query redundancy](#id_1) issues of two-stage DETR-like methods. 2. We present a hierarchical filtering mechanism to reduce the computational complexity under salience supervision. The proposed salience supervision benefits to capture [fine-grained object contours](#id_2) even with bounding box annotations. 3. Salience DETR achieves **+4.0%**, **+0.2%**, and **+4.4%** AP on three challenging defect detection tasks, and comparable performance (**49.2** AP) with about only **70\%** FLOPs on COCO 2017.

🔎Visualization

- Queries in the two-stage selection of existing DETR-like methods is usually **redundant** and have **scale bias** (left). - **Salience supervision** benefits to capture **object contours** even with only bounding box annotations, for both defect detection and object detection tasks (right).

Update

[2024-07-18] We release Relation-DETR, a general and strong object detection model that achieves 40+% AP using only 2 epochs and suppresses most SOTA methods including DDQ-DETR, StableDINO, Rank-DETR, MS-DETR. Code and checkpoints are available here.
[2024-04-19] Salience DETR with FocalNet-Large achieves 56.8 AP on COCO val2017, config and checkpoint are available!
[2024-04-08] Update config and checkpoint of Salience DETR with ConvNeXt-L backbone trained on COCO 2017 (12epoch).
[2024-04-01] Our Salience DETR with Swin-L backbone achieves 56.5 AP on COCO 2017 (12epoch). The model config and checkpoint are available.
[2024-03-26] We release code of Salience DETR and pretrained weights on COCO 2017 for Salience DETR with ResNet50 backbone.
[2024-02-29] Salience DETR is accepted in CVPR2024, and code will be released in the repo. Welcome to your attention!

Model Zoo

12 epoch setting

Model	backbone	mAP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L	Download
Salience DETR	ResNet50	50.0	67.7	54.2	33.3	54.4	64.4	config / checkpoint
Salience DETR	ConvNeXt-L	54.2	72.4	59.1	38.8	58.3	69.6	config / checkpoint
Salience DETR	Swin-L_(IN-22K)	56.5	75.0	61.5	40.2	61.2	72.8	config / checkpoint
Salience DETR	FocalNet-L_(IN-22K)	57.3	75.5	62.3	40.9	61.8	74.5	config / checkpoint

24 epoch setting

Model	backbone	mAP	AP₅₀	AP₇₅	AP_S	AP_M	AP_L	Download
Salience DETR	ResNet50	51.2	68.9	55.7	33.9	55.5	65.6	config / checkpoint

🔧Installation

Clone the repository locally:

git clone https://github.com/xiuqhou/Salience-DETR.git
cd Salience-DETR/

Create a conda environment and activate it:

conda create -n salience_detr python=3.8
conda activate salience_detr

Install PyTorch and Torchvision following the instruction on https://pytorch.org/get-started/locally/. The code requires python>=3.8, torch>=1.11.0, torchvision>=0.12.0.
```
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
```

Install other dependencies with:

conda install --file requirements.txt -c conda-forge

That's all, you don't need to compile CUDA operators mannually since we load it automatically when running for the first time.

📁Prepare Dataset

Please download COCO 2017 or prepare your own datasets into data/, and organize them as following. You can use tools/visualize_datasets.py to visualize the dataset annotations to verify its correctness.

coco/
  ├── train2017/
  ├── val2017/
  └── annotations/
    ├── instances_train2017.json
    └── instances_val2017.json

Example for visualization

```shell python tools/visualize_datasets.py \ --coco-img data/coco/val2017 \ --coco-ann data/coco/annotations/instances_val2017.json \ --show-dir visualize_dataset/ ```

📚︎Train a model

We use accelerate package to natively handle multi GPUs, use CUDA_VISIBLE_DEVICES to specify GPU/GPUs. If not specified, the script will use all available GPUs on the node to train.

CUDA_VISIBLE_DEVICES=0 accelerate launch main.py    # train with 1 GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch main.py  # train with 2 GPUs

Before start training, modify parameters in configs/train_config.py.

A simple example for train config

```python from torch import optim from datasets.coco import CocoDetection from transforms import presets from optimizer import param_dict # Commonly changed training configurations num_epochs = 12 # train epochs batch_size = 2 # total_batch_size = #GPU x batch_size num_workers = 4 # workers for pytorch DataLoader pin_memory = True # whether pin_memory for pytorch DataLoader print_freq = 50 # frequency to print logs starting_epoch = 0 max_norm = 0.1 # clip gradient norm output_dir = None # path to save checkpoints, default for None: checkpoints/{model_name} find_unused_parameters = False # useful for debugging distributed training # define dataset for train coco_path = "data/coco" # /PATH/TO/YOUR/COCODIR train_transform = presets.detr # see transforms/presets to choose a transform train_dataset = CocoDetection( img_folder=f"{coco_path}/train2017", ann_file=f"{coco_path}/annotations/instances_train2017.json", transforms=train_transform, train=True, ) test_dataset = CocoDetection( img_folder=f"{coco_path}/val2017", ann_file=f"{coco_path}/annotations/instances_val2017.json", transforms=None, # the eval_transform is integrated in the model ) # model config to train model_path = "configs/salience_detr/salience_detr_resnet50_800_1333.py" # specify a checkpoint folder to resume, or a pretrained ".pth" to finetune, for example: # checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50 # checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50/best_ap.pth resume_from_checkpoint = None learning_rate = 1e-4 # initial learning rate optimizer = optim.AdamW(lr=learning_rate, weight_decay=1e-4, betas=(0.9, 0.999)) lr_scheduler = optim.lr_scheduler.MultiStepLR(milestones=[10], gamma=0.1) # This define parameter groups with different learning rate param_dicts = param_dict.finetune_backbone_and_linear_projection(lr=learning_rate) ```

📈Evaluation/Test

To evaluate a model with one or more GPUs, specify CUDA_VISIBLE_DEVICES, dataset, model and checkpoint.

CUDA_VISIBLE_DEVICES=<gpu_ids> accelerate launch test.py --coco-path /path/to/coco --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth

Optional parameters are as follows, see test.py for full parameters:

--show-dir: path to save detection visualization results.
--result: specify a file to save detection numeric results, end with .json.

An example for evaluation

To evaluate `salience_detr_resnet50_800_1333` on `coco` using 8 GPUs, save predictions to `result.json` and visualize results to `visualization/`: ```shell CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch test.py --coco-path data/coco \ --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \ --checkpoint https://github.com/xiuqhou/Salience-DETR/releases/download/v1.0.0/salience_detr_resnet50_800_1333_coco_1x.pth \ --result result.json \ --show-dir visualization/ ```

Evaluate a json result file

To evaluate the json result file obtained above, specify the `--result` but not specify `--model`. ```shell CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --coco-path /path/to/coco --result /path/to/result.json ``` Optional parameters, see [test.py](test.py) for full parameters: - `--show-dir`: path to save detection visualization results.

▶︎Inference

Use inference.py to perform inference on images. You should specify the image directory using --image-dir.

python inference.py --image-dir /path/to/images --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth --show-dir /path/to/dir

An example for inference on an image folder

To performa inference for images under `images/` and save visualizations to `visualization/`: ```shell python inference.py \ --image-dir images/ \ --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \ --checkpoint checkpoint.pth \ --show-dir visualization/ ```

See inference.ipynb for inference on single image and visualization.

🔁Benchmark a model

To test the inference speed, memory cost and parameters of a model, use tools/benchmark_model.py.

python tools/benchmark_model.py --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py

📍Train your own datasets

To train your own datasets, there are some things to do before training:

Prepare your datasets with COCO annotation format, and modify coco_path in configs/train_config.py accordingly.
Open model configs under configs/salience_detr and modify the num_classes to a number larger than max_category_id + 1 of your dataset. For example, from the following annotation in instances_val2017.json, we can find the maximum category_id is 90 for COCO, so we set num_classes = 91.
```
{"supercategory": "indoor","id": 90,"name": "toothbrush"}
```
You can simply set num_classes to a large enough number if not sure what to set. (For example, num_classes = 92 or num_classes = 365 also work for COCO.)
If necessary, modify other parameters in model configs under configs/salience_detr and train_config.py.

📥Export an ONNX model

For advanced users who want to deploy our model, we provide a script to export an ONNX file.

python tools/pytorch2onnx.py \
    --model-config /path/to/model.py \
    --checkpoint /path/to/checkpoint.pth \
    --save-file /path/to/save.onnx \
    --simplify \  # use onnxsim to simplify the exported onnx file
    --verify  # verify the error between onnx model and pytorch model

For inference using the ONNX file, see ONNXDetector in tools/pytorch2onnx.py

Reference

If you find our work helpful for your research, please consider citing:

@InProceedings{Hou_2024_CVPR,
    author    = {Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong},
    title     = {Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {17574-17583}
}

@inproceedings{hou2024relation,
  title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
  author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong and Lan, Xuguang},
  booktitle={European conference on computer vision},
  year={2024},
  organization={Springer}
}

xiuqhou / Salience-DETR

readme