hyz-xmaster / VarifocalNet

VarifocalNet: An IoU-aware Dense Object Detector
Apache License 2.0
346 stars 52 forks source link
dense-object-detection focal-loss mscoco object-detection varifocal-loss varifocalnet

VarifocalNet: An IoU-aware Dense Object Detector

This repo hosts the code for implementing the VarifocalNet, as presented in our CVPR 2021 oral paper, which is available at: https://arxiv.org/abs/2008.13367:

@inproceedings{zhang2020varifocalnet,
  title={VarifocalNet: An IoU-aware Dense Object Detector},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  booktitle={CVPR},
  year={2021}
}

Introduction

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. In this work, we propose to learn IoU-aware classification scores (IACS) that simultaneously represent the object presence confidence and localization accuracy, to produce a more accurate ranking of detections in dense object detectors. In particular, we design a new loss function, named Varifocal Loss (VFL), for training a dense object detector to predict the IACS, and a new efficient star-shaped bounding box feature representation (the features at nine yellow sampling points) for estimating the IACS and refining coarse bounding boxes. Combining these two new components and a bounding box refinement branch, we build a new IoU-aware dense object detector based on the FCOS+ATSS architecture, what we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO benchmark show that our VFNet consistently surpasses the strong baseline by ~2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN reaches a single-model single-scale AP of 55.1 on COCO test-dev, achieving the state-of-the-art performance among various object detectors.

Learning to Predict the IoU-aware Classification Score.

Updates

Installation

A Quick Demo

Once the installation is done, you can follow the steps below to run a quick demo.

Usage of MMDetection

Please see exist_data_model.md for the basic usage of MMDetection. They also provide colab tutorial for beginners.

For troubleshooting, please refer to faq.md

Results and Models

For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia V100 GPUs (2 images per GPU).

Backbone Style DCN MS
train
Lr
schd
Inf time
(fps)
box AP
(val)
box AP
(test-dev)
    Download    
R-50 pytorch N N 1x 19.4 41.6 41.6 model | log
R-50 pytorch N Y 2x 19.3 44.5 44.8 model | log
R-50 pytorch Y Y 2x 16.3 47.8 48.0 model | log
R-101 pytorch N N 1x 15.5 43.0 43.6 model | log
R-101 pytorch N N 2x 15.6 43.5 43.9 model | log
R-101 pytorch N Y 2x 15.6 46.2 46.7 model | log
R-101 pytorch Y Y 2x 12.6 49.0 49.2 model | log
X-101-32x4d pytorch N Y 2x 13.1 47.4 47.6 model | log
X-101-32x4d pytorch Y Y 2x 10.1 49.7 50.0 model | log
X-101-64x4d pytorch N Y 2x 9.2 48.2 48.5 model | log
X-101-64x4d pytorch Y Y 2x 6.7 50.4 50.8 model | log
R2-101 pytorch N Y 2x 13.0 49.2 49.3 model | log
R2-101 pytorch Y Y 2x 10.3 51.1 51.3 model | log

Notes:

We also provide the models of RetinaNet, FoveaBox, RepPoints and ATSS trained with the Focal Loss (FL) and our Varifocal Loss (VFL).

Method Backbone MS train Lr schd box AP (val) Download
RetinaNet + FL R-50 N 1x 36.5 model | log
RetinaNet + VFL R-50 N 1x 37.4 model | log
FoveaBox + FL R-50 N 1x 36.3 model | log
FoveaBox + VFL R-50 N 1x 37.2 model | log
RepPoints + FL R-50 N 1x 38.3 model | log
RepPoints + VFL R-50 N 1x 39.7 model | log
ATSS + FL R-50 N 1x 39.3 model | log
ATSS + VFL R-50 N 1x 40.2 model | log

Notes:

VFNet-X

Backbone DCN MS
train
Training Inf
scale
Inf time
(fps)
box AP
(val)
box AP
(test-dev)
    Download    
R2-101 Y Y 41e + SWA 18e 1333x800 8.0 53.4 53.7 model | config
R2-101 Y Y 41e + SWA 18e 1800x1200 4.2 54.5 55.1

Notes:

We implement some improvements to the original VFNet. This version of VFNet is called VFNet-X and these improvements include:

For more detailed information, please see the VFNet-X config file.

Inference

Assuming you have put the COCO dataset into data/coco/ and have downloaded the models into the checkpoints/, you can now evaluate the models on the COCO val2017 split:

./tools/dist_test.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth 8 --eval bbox

Notes:

Training

The following command line will train vfnet_r50_fpn_1x_coco on 8 GPUs:

./tools/dist_train.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py 8

Notes:

Contributing

Any pull requests or issues are welcome.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:

@inproceedings{zhang2020varifocalnet,
  title={VarifocalNet: An IoU-aware Dense Object Detector},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  booktitle={CVPR},
  year={2021}
}

Acknowledgment

We would like to thank MMDetection team for producing this great object detection toolbox!

License

This project is released under the Apache 2.0 license.