linxid/Focus-DETR-mindspore

Focus-DETR

This is the official implementation of the paper "Less is More: Focus Attention for Efficient DETR"

Authors: Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang.

[arXiv] [BibTeX]

Focus-DETR is a model that focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Compared with the state-of-the-art sparse transformed-based detector under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO.

## Table of Contents - [Focus-DETR](#focus-detr) - [Table of Contents](#table-of-contents) - [Main Results with Pretrained Models](#main-results-with-pretrained-models) - [Pretrained focus\_detr with ResNet Backbone](#pretrained-focus_detr-with-resnet-backbone) - [Pretrained focus\_detr with Swin-Transformer Backbone](#pretrained-focus_detr-with-swin-transformer-backbone) - [Installation](#installation) - [Training](#training) - [Evaluation](#evaluation) - [Citing Focus-DETR](#citing-focus-detr) ## Main Results with Pretrained Models Here we provide the pretrained `Focus-DETR` weights based on detrex. ##### Pretrained focus_detr with ResNet Backbone

Name	Backbone	Pretrain	Epochs	Denoising Queries	box AP	download
Focus-DETR-R50-4scale	R-50	IN1k	12	100	48.8	model
Focus-DETR-R50-4scale	R-50	IN1k	24	100	50.3	model
Focus-DETR-R50-4scale	R-50	IN1k	36	100	50.4	model
Focus-DETR-R101-4scale	R-101	IN1k	12	100	50.8	model
Focus-DETR-R101-4scale	R-101	IN1k	24	100	51.2	model
Focus-DETR-R101-4scale	R-101	IN1k	36	100	51.4	model

#### Pretrained focus_detr with Swin-Transformer Backbone

Name	Backbone	Pretrain	Epochs	Denoising Queries	box AP	download
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN1k	12	100	50.0	model
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN1k	24	100	51.2	model
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN1k	36	100	52.5	model
Focus-DETR-Swin-T-224-4scale	Swin-Tiny-224	IN22k to IN1k	36	100	53.2	model
Focus-DETR-Swin-B-384-4scale	Swin-Base-384	IN22k to IN1k	36	100	56.2	model
Focus-DETR-Swin-L-384-4scale	Swin-Large-384	IN22k to IN1k	36	100	56.3	model

**Note:** * Swin-X-384 means the backbone pretrained resolution is 384 x 384 and IN22k to In1k means the model is pretrained on ImageNet-22k and finetuned on ImageNet-1k. ## Installation Please refer to [Installation Instructions](https://detrex.readthedocs.io/en/latest/tutorials/Installation.html) for the details of installation. ## Training All configs can be trained with: ```bash cd detrex python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --num-gpus 8 ``` By default, we use 8 GPUs with total batch size as 16 for training. ## Evaluation Model evaluation can be done as follows: ```bash cd detrex python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --eval-only train.init_checkpoint=/path/to/model_checkpoint ``` ## Citing Focus-DETR If you find our work helpful for your research, please consider citing the following BibTeX entry. ```BibTex @misc{zheng2023more, title={Less is More: Focus Attention for Efficient DETR}, author={Dehua Zheng and Wenhui Dong and Hailin Hu and Xinghao Chen and Yunhe Wang}, year={2023}, eprint={2307.12612}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```

linxid / Focus-DETR-mindspore

readme

Focus-DETR