Created by Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik at UC Berkeley EECS.
Acknowledgements: a huge thanks to Yangqing Jia for creating Caffe and the BVLC team, with a special shoutout to Evan Shelhamer, for maintaining Caffe and helping to merge the R-CNN fine-tuning code into Caffe.
R-CNN is a state-of-the-art visual object detection system that combines bottom-up region proposals with rich features computed by a convolutional neural network. At the time of its release, R-CNN improved the previous best detection performance on PASCAL VOC 2012 by 30% relative, going from 40.9% to 53.3% mean average precision. Unlike the previous best results, R-CNN achieves this performance without using contextual rescoring or an ensemble of feature types.
R-CNN was initially described in an arXiv tech report and will appear in a forthcoming CVPR 2014 paper.
If you find R-CNN useful in your research, please consider citing:
@inproceedings{girshick14CVPR,
Author = {Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra},
Title = {Rich feature hierarchies for accurate object detection and semantic segmentation},
Booktitle = {Computer Vision and Pattern Recognition},
Year = {2014}
}
R-CNN is released under the Simplified BSD License (refer to the LICENSE file for details).
Method | VOC 2007 mAP | VOC 2010 mAP | VOC 2012 mAP |
---|---|---|---|
R-CNN | 54.2% | 50.2% | 49.6% |
R-CNN bbox reg | 58.5% | 53.7% | 53.3% |
Method | ILSVRC2013 test mAP |
---|---|
R-CNN bbox reg | 31.4% |
ilsvrc
branch (still needs some cleanup before merging into master
)$CAFFE_ROOT
(you can run export CAFFE_ROOT=$(pwd)
)make matcaffe
cd $CAFFE_ROOT/data/ilsvrc12 && ./get_ilsvrc_aux.sh
to download the ImageNet image meangit clone https://github.com/rbgirshick/rcnn.git
cd rcnn
external/caffe
, so create a symlink: ln -sf $CAFFE_ROOT external/caffe
rcnn
directory): matlab
R-CNN startup done
followed by the MATLAB prompt >>
.>> rcnn_build()
(builds liblinear and Selective Search). Don't worry if you see compiler warnings while building liblinear, this is normal on my system.>> key = caffe('get_init_key');
(expected output is key = -2)Common issues: You may need to set an LD_LIBRARY_PATH
before you start MATLAB. If you see a message like "Invalid MEX-file '/path/to/rcnn/external/caffe/matlab/caffe/caffe.mexa64': libmkl_rt.so: cannot open shared object file: No such file or directory" then make sure that CUDA and MKL are in your LD_LIBRARY_PATH
. On my system, I use:
export LD_LIBRARY_PATH=/opt/intel/mkl/lib/intel64:/usr/local/cuda/lib64
The quickest way to get started is to download pre-computed R-CNN detectors. Currently we have detectors trained on PASCAL VOC 2007 train+val, 2012 train, and ILSVRC13 train+val. Unfortunately the download is large (1.5GB), so brew some coffee or take a walk while waiting.
From the rcnn
folder, run the model fetch script: ./data/fetch_models.sh
.
This will populate the rcnn/data
folder with caffe_nets
and rcnn_models
. See rcnn/data/README.md
for details.
Pre-computed selective search boxes can also be downloaded for VOC2007, VOC2012, and ILSVRC13.
From the rcnn
folder, run the selective search data fetch script: ./data/fetch_selective_search_data.sh
.
This will populate the rcnn/data
folder with selective_selective_data
.
Caffe compatibility note: R-CNN has been updated to use the new Caffe proto messages that were rolled out in Caffe v0.999. The model package contains models in the up-to-date proto format. If, for some reason, you need to get the old (Caffe proto v0) models, they can still be downloaded: VOC models ILSVRC13 model.
Let's assume that you've downloaded the precomputed detectors. Now:
cd rcnn
. matlab
.
R-CNN startup done
when MATLAB starts, then you probably didn't start MATLAB in rcnn
directory.>> rcnn_demo
Let's use PASCAL VOC 2007 as an example. The basic pipeline is:
extract features to disk -> train SVMs -> test
You'll need about 200GB of disk space free for the feature cache (which is stored in rcnn/feat_cache
by default; symlink rcnn/feat_cache
elsewhere if needed). It's best if the feature cache is on a fast, local disk. Before running the pipeline, we first need to install the PASCAL VOC 2007 dataset.
Download the training, validation, test data and VOCdevkit:
wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtrainval_06-Nov-2007.tar wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtest_06-Nov-2007.tar wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
Extract all of these tars into one directory, it's called VOCdevkit
.
tar xvf VOCtrainval_06-Nov-2007.tar tar xvf VOCtest_06-Nov-2007.tar tar xvf VOCdevkit_08-Jun-2007.tar
It should have this basic structure:
VOCdevkit/ % development kit VOCdevkit/VOCcode/ % VOC utility code VOCdevkit/VOC2007 % image sets, annotations, etc. ... and several other directories ...
I use a symlink to hook the R-CNN codebase to the PASCAL VOC dataset:
ln -sf /your/path/to/voc2007/VOCdevkit /path/to/rcnn/datasets/VOCdevkit2007
>> rcnn_exp_cache_features('train'); % chunk1 >> rcnn_exp_cache_features('val'); % chunk2 >> rcnn_exp_cache_features('test_1'); % chunk3 >> rcnn_exp_cache_features('test_2'); % chunk4
Pro tip: on a machine with one hefty GPU (e.g., k20, k40, titan) and a six-core processor, I run start two MATLAB sessions each with a three worker matlabpool. I then run chunk1 and chunk2 in parallel on that machine. In this setup, completing chunk1 and chunk2 takes about 8-9 hours (depending on your CPU/GPU combo and disk) on a single machine. Obviously, if you have more machines you can hack this function to split the workload.
Now to run the training and testing code, use the following experiments script:
>> test_results = rcnn_exp_train_and_test()
Note: The training and testing procedures save models and results under rcnn/cachedir
by default. You can customize this by creating a local config file named rcnn_config_local.m
and defining the experiment directory variable EXP_DIR
. Look at rcnn_config_local.example.m
for an example.
It should be easy to train an R-CNN detector using another detection dataset as long as that dataset has complete bounding box annotations (i.e., all instances of all classes are labeled).
To support a new dataset, you define three functions: (1) one that returns a structure that describes the class labels and list of images; (2) one that returns a region of interest (roi) structure that describes the bounding box annotations; and (3) one that provides an test evaluation function.
You can follow the PASCAL VOC implementation as your guide:
imdb/imdb_from_voc.m (list of images and classes)
imdb/roidb_from_voc.m (region of interest database)
imdb/imdb_eval_voc.m (evalutation)
As an example, let's see how you would fine-tune a CNN for detection on PASCAL VOC 2012.
rcnn
directory>> imdb_train = imdb_from_voc('datasets/VOCdevkit2012', 'train', '2012');
>> imdb_val = imdb_from_voc('datasets/VOCdevkit2012', 'val', '2012');
>> rcnn_make_window_file(imdb_train, 'external/caffe/examples/pascal-finetuning');
>> rcnn_make_window_file(imdb_val, 'external/caffe/examples/pascal-finetuning');
Run fine-tuning with Caffe
cp finetuning/voc_2012_prototxt/pascal_finetune_* external/caffe/examples/pascal-finetuning/
external/caffe/examples/pascal-finetuning
/path/to/rcnn
with the actual path to where R-CNN is installed):GLOG_logtostderr=1 ../../build/tools/finetune_net.bin \ pascal_finetune_solver.prototxt \ /path/to/rcnn/data/caffe_nets/ilsvrc_2012_train_iter_310k 2>&1 | tee log.txt
Note: In my experiments, I've let fine-tuning run for 70k iterations, although with hindsight it appears that improvement in mAP saturates at around 40k iterations.