xiuqhou / Salience-DETR

[CVPR 2024] Official implementation of the paper "Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement"
https://arxiv.org/abs/2403.16131
Apache License 2.0
105 stars 7 forks source link
attention detection detr object-detection salience-detr transformer transformers

English | 简体中文

Salience DETR

PWC arXiv PRs Welcome GitHub license GitHub stars GitHub forks

This repository is an official implementation of the Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement accepeted to CVPR 2024 (score 553). Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen.

💖 If our Salience-DETR is helpful to your researches or projects, please star this repository. Thanks! 🤗

✨Highlights 1. We offer a deepened analysis for [scale bias and query redundancy](#id_1) issues of two-stage DETR-like methods. 2. We present a hierarchical filtering mechanism to reduce the computational complexity under salience supervision. The proposed salience supervision benefits to capture [fine-grained object contours](#id_2) even with bounding box annotations. 3. Salience DETR achieves **+4.0%**, **+0.2%**, and **+4.4%** AP on three challenging defect detection tasks, and comparable performance (**49.2** AP) with about only **70\%** FLOPs on COCO 2017.
🔎Visualization - Queries in the two-stage selection of existing DETR-like methods is usually **redundant** and have **scale bias** (left). - **Salience supervision** benefits to capture **object contours** even with only bounding box annotations, for both defect detection and object detection tasks (right).

Update

Model Zoo

12 epoch setting

Model backbone mAP AP50 AP75 APS APM APL Download
Salience DETR ResNet50 50.0 67.7 54.2 33.3 54.4 64.4 config / checkpoint
Salience DETR ConvNeXt-L 54.2 72.4 59.1 38.8 58.3 69.6 config / checkpoint
Salience DETR Swin-L(IN-22K) 56.5 75.0 61.5 40.2 61.2 72.8 config / checkpoint
Salience DETR FocalNet-L(IN-22K) 57.3 75.5 62.3 40.9 61.8 74.5 config / checkpoint

24 epoch setting

Model backbone mAP AP50 AP75 APS APM APL Download
Salience DETR ResNet50 51.2 68.9 55.7 33.9 55.5 65.6 config / checkpoint

🔧Installation

  1. Clone the repository locally:

    git clone https://github.com/xiuqhou/Salience-DETR.git
    cd Salience-DETR/
  2. Create a conda environment and activate it:

    conda create -n salience_detr python=3.8
    conda activate salience_detr
  3. Install PyTorch and Torchvision following the instruction on https://pytorch.org/get-started/locally/. The code requires python>=3.8, torch>=1.11.0, torchvision>=0.12.0.

    conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
  4. Install other dependencies with:

    conda install --file requirements.txt -c conda-forge

That's all, you don't need to compile CUDA operators mannually since we load it automatically when running for the first time.

📁Prepare Dataset

Please download COCO 2017 or prepare your own datasets into data/, and organize them as following. You can use tools/visualize_datasets.py to visualize the dataset annotations to verify its correctness.

coco/
  ├── train2017/
  ├── val2017/
  └── annotations/
    ├── instances_train2017.json
    └── instances_val2017.json
Example for visualization ```shell python tools/visualize_datasets.py \ --coco-img data/coco/val2017 \ --coco-ann data/coco/annotations/instances_val2017.json \ --show-dir visualize_dataset/ ```

📚︎Train a model

We use accelerate package to natively handle multi GPUs, use CUDA_VISIBLE_DEVICES to specify GPU/GPUs. If not specified, the script will use all available GPUs on the node to train.

CUDA_VISIBLE_DEVICES=0 accelerate launch main.py    # train with 1 GPU
CUDA_VISIBLE_DEVICES=0,1 accelerate launch main.py  # train with 2 GPUs

Before start training, modify parameters in configs/train_config.py.

A simple example for train config ```python from torch import optim from datasets.coco import CocoDetection from transforms import presets from optimizer import param_dict # Commonly changed training configurations num_epochs = 12 # train epochs batch_size = 2 # total_batch_size = #GPU x batch_size num_workers = 4 # workers for pytorch DataLoader pin_memory = True # whether pin_memory for pytorch DataLoader print_freq = 50 # frequency to print logs starting_epoch = 0 max_norm = 0.1 # clip gradient norm output_dir = None # path to save checkpoints, default for None: checkpoints/{model_name} find_unused_parameters = False # useful for debugging distributed training # define dataset for train coco_path = "data/coco" # /PATH/TO/YOUR/COCODIR train_transform = presets.detr # see transforms/presets to choose a transform train_dataset = CocoDetection( img_folder=f"{coco_path}/train2017", ann_file=f"{coco_path}/annotations/instances_train2017.json", transforms=train_transform, train=True, ) test_dataset = CocoDetection( img_folder=f"{coco_path}/val2017", ann_file=f"{coco_path}/annotations/instances_val2017.json", transforms=None, # the eval_transform is integrated in the model ) # model config to train model_path = "configs/salience_detr/salience_detr_resnet50_800_1333.py" # specify a checkpoint folder to resume, or a pretrained ".pth" to finetune, for example: # checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50 # checkpoints/salience_detr_resnet50_800_1333/train/2024-03-22-09_38_50/best_ap.pth resume_from_checkpoint = None learning_rate = 1e-4 # initial learning rate optimizer = optim.AdamW(lr=learning_rate, weight_decay=1e-4, betas=(0.9, 0.999)) lr_scheduler = optim.lr_scheduler.MultiStepLR(milestones=[10], gamma=0.1) # This define parameter groups with different learning rate param_dicts = param_dict.finetune_backbone_and_linear_projection(lr=learning_rate) ```

📈Evaluation/Test

To evaluate a model with one or more GPUs, specify CUDA_VISIBLE_DEVICES, dataset, model and checkpoint.

CUDA_VISIBLE_DEVICES=<gpu_ids> accelerate launch test.py --coco-path /path/to/coco --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth

Optional parameters are as follows, see test.py for full parameters:

An example for evaluation To evaluate `salience_detr_resnet50_800_1333` on `coco` using 8 GPUs, save predictions to `result.json` and visualize results to `visualization/`: ```shell CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch test.py --coco-path data/coco \ --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \ --checkpoint https://github.com/xiuqhou/Salience-DETR/releases/download/v1.0.0/salience_detr_resnet50_800_1333_coco_1x.pth \ --result result.json \ --show-dir visualization/ ```
Evaluate a json result file To evaluate the json result file obtained above, specify the `--result` but not specify `--model`. ```shell CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --coco-path /path/to/coco --result /path/to/result.json ``` Optional parameters, see [test.py](test.py) for full parameters: - `--show-dir`: path to save detection visualization results.

▶︎Inference

Use inference.py to perform inference on images. You should specify the image directory using --image-dir.

python inference.py --image-dir /path/to/images --model-config /path/to/model.py --checkpoint /path/to/checkpoint.pth --show-dir /path/to/dir
An example for inference on an image folder To performa inference for images under `images/` and save visualizations to `visualization/`: ```shell python inference.py \ --image-dir images/ \ --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py \ --checkpoint checkpoint.pth \ --show-dir visualization/ ```

See inference.ipynb for inference on single image and visualization.

🔁Benchmark a model

To test the inference speed, memory cost and parameters of a model, use tools/benchmark_model.py.

python tools/benchmark_model.py --model-config configs/salience_detr/salience_detr_resnet50_800_1333.py

📍Train your own datasets

To train your own datasets, there are some things to do before training:

  1. Prepare your datasets with COCO annotation format, and modify coco_path in configs/train_config.py accordingly.
  2. Open model configs under configs/salience_detr and modify the num_classes to a number larger than max_category_id + 1 of your dataset. For example, from the following annotation in instances_val2017.json, we can find the maximum category_id is 90 for COCO, so we set num_classes = 91.

    {"supercategory": "indoor","id": 90,"name": "toothbrush"}

    You can simply set num_classes to a large enough number if not sure what to set. (For example, num_classes = 92 or num_classes = 365 also work for COCO.)

  3. If necessary, modify other parameters in model configs under configs/salience_detr and train_config.py.

📥Export an ONNX model

For advanced users who want to deploy our model, we provide a script to export an ONNX file.

python tools/pytorch2onnx.py \
    --model-config /path/to/model.py \
    --checkpoint /path/to/checkpoint.pth \
    --save-file /path/to/save.onnx \
    --simplify \  # use onnxsim to simplify the exported onnx file
    --verify  # verify the error between onnx model and pytorch model

For inference using the ONNX file, see ONNXDetector in tools/pytorch2onnx.py

Reference

If you find our work helpful for your research, please consider citing:

@InProceedings{Hou_2024_CVPR,
    author    = {Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong},
    title     = {Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {17574-17583}
}

@inproceedings{hou2024relation,
  title={Relation DETR: Exploring Explicit Position Relation Prior for Object Detection},
  author={Hou, Xiuquan and Liu, Meiqin and Zhang, Senlin and Wei, Ping and Chen, Badong and Lan, Xuguang},
  booktitle={European conference on computer vision},
  year={2024},
  organization={Springer}
}