This is an implementation of Single Shot MultiBox Detector (SSD) paper for pedestrian detection.
This is still very much a work in progress! Presently the model is strongly overfitting.
After seeing many implementations which required you to specify very long command line arguments, I realized that running out of a configuration file was a better way to go.
Here all meta-data and hyper-parameters are stored in a yaml configuration file. On every run a new output directory is created containig the new model and the configuration file that was used to create it.
Thus you could automate hyper-parameter search and/or tweak it by hand at any time and you can always go and find what were the exact hyper-parameters values used for a particular model. I.e. you can focus on experimenting and forget about keeping track of long command lines.
The code supports the VOC2012 dataset and the Caltech Pedestrian dataset. It has been written so you can add support for your own datasets as well and mix & match as necessary.
Just implement a class with 2 callbacks to return a traint/test split in a particular form and the string-label to numeric-label mapping.
$ mkdir voc_vgg16_1000
Create a yaml file containing all hyper-parameters and meta-data for the experiment you want to run. Use the template yaml file given in the home directory. You can specify nets, datasets and hyper-parameters (size of default boxes, feature_map sizes etc).
$ cp ssd_config.tmpl.yaml voc_vgg16_1000/
And then edit the file as necessary.
The core hyperparameter of SSD is the size of the feature maps used. To figure out feature map sizes do the following:
python train.py <directory-containing-configuration-yaml-file>
for example:
$ python train.py /Users/vivek/work/ssd-code/tiny_voc
Make sure you change the feature maps in the configuration file. (TODO: automate this!) Once this is done, all other hyperparameters derived from this are automatically calculated.
You can choose from the Pascal VOC2012 dataset or the Stanford Pedestrian detection dataset. Change the yaml file accordingly and run:
$ python pre_process.py
Usage:
pre_process.py <directory-containing-configuration-yaml-file> <dataset-name> <dataset-directory> <number-of-images>
Example:
pre_process.py ./tiny_voc voc2012 voc-data/VOC2012 10
This will create a sample dataset of whatever size you choose and pre_process all images and put them into the project directory. (eg: voc_vgg16_1000)
After configuring the correct directory name run python train.py dirname
Usage:
train.py <directory-containing-configuration-yaml-file>
Example:
train.py /Users/vivek/work/ssd-code/tiny_voc
This command will use all the meta-data and hyper-parameters in
During training a new directory will be created inside your main directory containing a copy of the configuration file. The model is timestamped every 5 EPOCHS.
To run inference using any of the models saved do the following:
Usage:
inference.py <directory-containing-configuration-yaml-file> <model-name-relative-to-directory>
Example:
inference.py /Users/vivek/work/ssd-code/tiny_voc Jul_05_161614_O3K2T/final-model
The inference class will pull an image from your test set and show you a prediction.
I've trained the system with VGG16 using 3000 images from the Caltech Pedestrian Detection dataset. This took 2 days of running on AWS gpu.large instance. There are still a lot of false positives being created by the system.
After adding batch_normalization and adding a multiplication factor to the localization loss (as its much smaller than the the confindence loss) the weights seem to be much smaller. Waiting for a pause in my job search to rebuild the model.
$ bash shells/download.sh
$ python scripts/convert_annotations.py
$ python scripts/convert_seqs.py
Each .seq
movie is separated into .png
images. Each image's filename is consisted of {set**}_{V***}_{frame_num}.png
. According to the official site, set06
~set10
are for test dataset, while the rest are for training dataset.
(Number of objects: 346621)
$ python tests/test_plot_annotations.py
The following gives you an idea on how to configure and run the system. The nice thing about doing it this way is that you don't have to keep passing in parameters through a command line argument. And also you have a record of your hyper-parameters with your data + model all in one place.
# Dataset vars
dataset_name: "stanford"
image_width: 640
image_height: 480
num_classes: 2
n_channels: 3
images_path: "/home/ubuntu/tensorflow_ssd/data/images/"
# SSD config vars
net: "vgg16"
default_box_scales:
-
- 0.0
- 0.0
- 0.9
- 1.5
-
- 0.2
- -0.2
- 0.9
- 0.8
feature_maps:
-
- 5
- 4
-
- 10
- 8
-
- 20
- 15
-
- 40
- 30
neg_pos_ratio: 4
pred_conf_threshold: 0.8
num_epochs: 110
batch_size: 16
adam_learning_rate: 0.001