This repo hosts the code for implementing the VarifocalNet, as presented in our CVPR 2021 oral paper, which is available at: https://arxiv.org/abs/2008.13367:
@inproceedings{zhang2020varifocalnet,
title={VarifocalNet: An IoU-aware Dense Object Detector},
author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
booktitle={CVPR},
year={2021}
}
Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. In this work, we propose to learn IoU-aware classification scores (IACS) that simultaneously represent the object presence confidence and localization accuracy, to produce a more accurate ranking of detections in dense object detectors. In particular, we design a new loss function, named Varifocal Loss (VFL), for training a dense object detector to predict the IACS, and a new efficient star-shaped bounding box feature representation (the features at nine yellow sampling points) for estimating the IACS and refining coarse bounding boxes. Combining these two new components and a bounding box refinement branch, we build a new IoU-aware dense object detector based on the FCOS+ATSS architecture, what we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO benchmark show that our VFNet consistently surpasses the strong baseline by ~2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN reaches a single-model single-scale AP of 55.1 on COCO test-dev
, achieving the state-of-the-art performance among various object detectors.
Learning to Predict the IoU-aware Classification Score.
old
branch.This VarifocalNet implementation is based on MMDetection. Therefore the installation is the same as original MMDetection.
Please check get_started.md for installation. Note that you should change the version of PyTorch and CUDA to yours when installing mmcv in step 3
and clone this repo instead of MMdetection in step 4
.
If you run into problems with pycocotools
, please install it by:
pip install "git+https://github.com/open-mmlab/cocoapi.git#subdirectory=pycocotools"
Once the installation is done, you can follow the steps below to run a quick demo.
checkpoints/
.Run
python demo/image_demo.py demo/demo.jpg configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth
and you should see an image with detections.
Please see exist_data_model.md for the basic usage of MMDetection. They also provide colab tutorial for beginners.
For troubleshooting, please refer to faq.md
For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia V100 GPUs (2 images per GPU).
Backbone | Style | DCN | MS train |
Lr schd |
Inf time (fps) |
box AP (val) |
box AP (test-dev) |
Download |
---|---|---|---|---|---|---|---|---|
R-50 | pytorch | N | N | 1x | 19.4 | 41.6 | 41.6 | model | log |
R-50 | pytorch | N | Y | 2x | 19.3 | 44.5 | 44.8 | model | log |
R-50 | pytorch | Y | Y | 2x | 16.3 | 47.8 | 48.0 | model | log |
R-101 | pytorch | N | N | 1x | 15.5 | 43.0 | 43.6 | model | log |
R-101 | pytorch | N | N | 2x | 15.6 | 43.5 | 43.9 | model | log |
R-101 | pytorch | N | Y | 2x | 15.6 | 46.2 | 46.7 | model | log |
R-101 | pytorch | Y | Y | 2x | 12.6 | 49.0 | 49.2 | model | log |
X-101-32x4d | pytorch | N | Y | 2x | 13.1 | 47.4 | 47.6 | model | log |
X-101-32x4d | pytorch | Y | Y | 2x | 10.1 | 49.7 | 50.0 | model | log |
X-101-64x4d | pytorch | N | Y | 2x | 9.2 | 48.2 | 48.5 | model | log |
X-101-64x4d | pytorch | Y | Y | 2x | 6.7 | 50.4 | 50.8 | model | log |
R2-101 | pytorch | N | Y | 2x | 13.0 | 49.2 | 49.3 | model | log |
R2-101 | pytorch | Y | Y | 2x | 10.3 | 51.1 | 51.3 | model | log |
Notes:
range
mode) and the inference scale keeps 1333x800.DCNv2
in both backbone and head.We also provide the models of RetinaNet, FoveaBox, RepPoints and ATSS trained with the Focal Loss (FL) and our Varifocal Loss (VFL).
Method | Backbone | MS train | Lr schd | box AP (val) | Download |
---|---|---|---|---|---|
RetinaNet + FL | R-50 | N | 1x | 36.5 | model | log |
RetinaNet + VFL | R-50 | N | 1x | 37.4 | model | log |
FoveaBox + FL | R-50 | N | 1x | 36.3 | model | log |
FoveaBox + VFL | R-50 | N | 1x | 37.2 | model | log |
RepPoints + FL | R-50 | N | 1x | 38.3 | model | log |
RepPoints + VFL | R-50 | N | 1x | 39.7 | model | log |
ATSS + FL | R-50 | N | 1x | 39.3 | model | log |
ATSS + VFL | R-50 | N | 1x | 40.2 | model | log |
Notes:
use_vfl
flag in those config files controls whether to use the Varifocal Loss in training or not.Backbone | DCN | MS train |
Training | Inf scale |
Inf time (fps) |
box AP (val) |
box AP (test-dev) |
Download |
---|---|---|---|---|---|---|---|---|
R2-101 | Y | Y | 41e + SWA 18e | 1333x800 | 8.0 | 53.4 | 53.7 | model | config |
R2-101 | Y | Y | 41e + SWA 18e | 1800x1200 | 4.2 | 54.5 | 55.1 |
Notes:
We implement some improvements to the original VFNet. This version of VFNet is called VFNet-X and these improvements include:
PAFPN. We replace the FPN with the PAFPNX (minor modifications are made to the original PAFPN), and apply the DCN and group normalization (GN) in it.
More and Wider Conv Layers. We stack 4 convolution layers in the detection head, instead of 3 layers in the original VFNet, and increase the original 256 feature channels to 384 channels.
RandomCrop and Cutout. We employ the random crop and cutout as additional data augmentation methods.
Wider MSTrain Scale Range and Longer Training. We adopt a wider MSTrain scale range, from 750x500 to 2100x1400, and initially train the VFNet-X for 41 epochs.
SWA. We apply the technique of Stochastic Weight Averaging (SWA) in training the VFNet-X (for another 18 epochs), which brings 1.2 AP gain. Please see our work of SWA Object Detection for more details.
Soft-NMS. We apply soft-NMS in inference.
For more detailed information, please see the VFNet-X config file.
Assuming you have put the COCO dataset into data/coco/
and have downloaded the models into the checkpoints/
, you can now evaluate the models on the COCO val2017 split:
./tools/dist_test.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth 8 --eval bbox
Notes:
The following command line will train vfnet_r50_fpn_1x_coco
on 8 GPUs:
./tools/dist_train.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py 8
Notes:
work_dirs/vfnet_r50_fpn_1x_coco
.samplers_per_gpu x number_of_gpus = 16
. In general, workers_per_gpu = samples_per_gpu
.Any pull requests or issues are welcome.
Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:
@inproceedings{zhang2020varifocalnet,
title={VarifocalNet: An IoU-aware Dense Object Detector},
author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
booktitle={CVPR},
year={2021}
}
We would like to thank MMDetection team for producing this great object detection toolbox!
This project is released under the Apache 2.0 license.