linxid / Focus-DETR-mindspore

[ICCV 2023] Official implementation of the paper "Less is More: Focus Attention for Efficient DETR"
Apache License 2.0
75 stars 2 forks source link
detr object-detection transformer

Focus-DETR

This is the official implementation of the paper "Less is More: Focus Attention for Efficient DETR"

Authors: Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang.

[arXiv] [BibTeX]

Focus-DETR is a model that focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Compared with the state-of-the-art sparse transformed-based detector under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO.


## Table of Contents - [Focus-DETR](#focus-detr) - [Table of Contents](#table-of-contents) - [Main Results with Pretrained Models](#main-results-with-pretrained-models) - [Pretrained focus\_detr with ResNet Backbone](#pretrained-focus_detr-with-resnet-backbone) - [Pretrained focus\_detr with Swin-Transformer Backbone](#pretrained-focus_detr-with-swin-transformer-backbone) - [Installation](#installation) - [Training](#training) - [Evaluation](#evaluation) - [Citing Focus-DETR](#citing-focus-detr) ## Main Results with Pretrained Models Here we provide the pretrained `Focus-DETR` weights based on detrex. ##### Pretrained focus_detr with ResNet Backbone
Name Backbone Pretrain Epochs Denoising Queries box
AP
download
Focus-DETR-R50-4scale R-50 IN1k 12 100 48.8 model
Focus-DETR-R50-4scale R-50 IN1k 24 100 50.3 model
Focus-DETR-R50-4scale R-50 IN1k 36 100 50.4 model
Focus-DETR-R101-4scale R-101 IN1k 12 100 50.8 model
Focus-DETR-R101-4scale R-101 IN1k 24 100 51.2 model
Focus-DETR-R101-4scale R-101 IN1k 36 100 51.4 model
#### Pretrained focus_detr with Swin-Transformer Backbone
Name Backbone Pretrain Epochs Denoising Queries box
AP
download
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN1k 12 100 50.0 model
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN1k 24 100 51.2 model
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN1k 36 100 52.5 model
Focus-DETR-Swin-T-224-4scale Swin-Tiny-224 IN22k to IN1k 36 100 53.2 model
Focus-DETR-Swin-B-384-4scale Swin-Base-384 IN22k to IN1k 36 100 56.2 model
Focus-DETR-Swin-L-384-4scale Swin-Large-384 IN22k to IN1k 36 100 56.3 model
**Note:** * Swin-X-384 means the backbone pretrained resolution is 384 x 384 and IN22k to In1k means the model is pretrained on ImageNet-22k and finetuned on ImageNet-1k. ## Installation Please refer to [Installation Instructions](https://detrex.readthedocs.io/en/latest/tutorials/Installation.html) for the details of installation. ## Training All configs can be trained with: ```bash cd detrex python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --num-gpus 8 ``` By default, we use 8 GPUs with total batch size as 16 for training. ## Evaluation Model evaluation can be done as follows: ```bash cd detrex python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --eval-only train.init_checkpoint=/path/to/model_checkpoint ``` ## Citing Focus-DETR If you find our work helpful for your research, please consider citing the following BibTeX entry. ```BibTex @misc{zheng2023more, title={Less is More: Focus Attention for Efficient DETR}, author={Dehua Zheng and Wenhui Dong and Hailin Hu and Xinghao Chen and Yunhe Wang}, year={2023}, eprint={2307.12612}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```