Faster R-CNN in MXNet
Set up environment
- Require latest MXNet. Set environment variable by
export MXNET_CUDNN_AUTOTUNE_DEFAULT=0
.
- Install Python package
mxnet
(cpu inference only) or mxnet-cu90
(gpu training), cython
then opencv-python matplotlib pycocotools tqdm
.
Out-of-box inference models
Download any of the following models to the current directory and run python3 demo.py --dataset $Dataset$ --network $Network$ --params $MODEL_FILE$ --image $YOUR_IMAGE$
to get single image inference.
For example python3 demo.py --dataset voc --network vgg16 --params vgg16_voc0712.params --image myimage.jpg
, add --gpu 0
to use GPU optionally.
Different network has different configuration. Different dataset has different object class names. You must pass them explicitly as command line arguments.
Network |
Dataset |
Imageset |
Reference |
Result |
Link |
vgg16 |
voc |
07/07 |
69.9 |
70.23 |
Dropbox |
vgg16 |
voc |
07++12/07 |
73.2 |
75.97 |
Dropbox |
resnet101 |
voc |
07++12/07 |
76.4 |
79.35 |
Dropbox |
vgg16 |
coco |
train2017/val2017 |
21.2 |
22.8 |
Dropbox |
resnet101 |
coco |
train2017/val2017 |
27.2 |
26.1 |
Dropbox |
Download data and label
Make a directory data
and follow py-faster-rcnn
for data preparation instructions.
- Pascal VOC should be in
data/VOCdevkit
containing VOC2007
, VOC2012
and annotations
.
- MSCOCO should be in
data/coco
containing train2017
, val2017
and annotations/instances_train2017.json
, annotations/instances_val2017.json
.
Download pretrained ImageNet models
Training and evaluation
Use python3 train.py --dataset $Dataset$ --network $Network$ --pretrained $IMAGENET_MODEL_FILE$ --gpus $GPUS$
to train,
for example, python3 train.py --dataset voc --network vgg16 --pretrained model/vgg16-0000.params --gpus 0,1
.
Use python3 test.py --dataset $Dataset$ --network $Network$ --params $MODEL_FILE$ --gpu $GPU$
to evaluate,
for example, python3 test.py --dataset voc --network vgg16 --params model/vgg16-0010.params --gpu 0
.
History
- May 25, 2016: We released Fast R-CNN implementation.
- July 6, 2016: We released Faster R-CNN implementation.
- July 23, 2016: We updated to MXNet module solver.
- Oct 10, 2016: tornadomeet released approximate end-to-end training.
- Oct 30, 2016: We updated to MXNet module inference.
- Jan 19, 2017: We accelerated our pipeline and supported ResNet training.
- Jun 22, 2018: We simplified code.
Disclaimer
This repository used code from MXNet,
Fast R-CNN,
Faster R-CNN,
caffe,
tornadomeet/mx-rcnn,
MS COCO API.
Thanks to tornadomeet for end-to-end experiments and MXNet contributers for helpful discussions.
References
- Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems, 2015
- Ross Girshick. "Fast R-CNN." In Proceedings of the IEEE International Conference on Computer Vision, 2015.
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. "Caffe: Convolutional architecture for fast feature embedding." In Proceedings of the ACM International Conference on Multimedia, 2014.
- Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. "The pascal visual object classes (voc) challenge." International journal of computer vision 88, no. 2 (2010): 303-338.
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. "ImageNet: A large-scale hierarchical image database." In Computer Vision and Pattern Recognition, IEEE Conference on, 2009.
- Karen Simonyan, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition". In Computer Vision and Pattern Recognition, IEEE Conference on, 2016.
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. "Microsoft COCO: Common Objects in Context" In European Conference on Computer Vision, pp. 740-755. Springer International Publishing, 2014.