From the notebook I had on this:

Object detection

Algorithms

YOLO
- You only look once
- One of the fastest, the other option is SSD (single shot multibox detector)
  
  TODO been put in its own page
R-CNN (R is for "region")
Fast R-CNN

Jargon and basic concepts

bb: bounding box
anchors: offsets to reference boxes > TODO: expand
IoU
feature map

this is all from 1

Obj detection: usual CNN with last dense layer won't work as length of it is not known - number of objs to detect not fixed. Naive approach with CNNs: separate img into regions and use CNN for classification of those; but objects may overlap and have different aspect ratios and different locations. Means huge number of regions to use, which is computationally expensive.

So instead of using classification you use different methods like YOLO, SSD, R-CNN.

the pre-history: Overfeat

though it's about contemporary to R-CNN think

R-CNN

Use selective search to extract 2000 regions (region proposals): generate candidate regions initially; then greedylu combine them into larger

TODO have to read original paper
these 2000 regions are resized to square (?) and fed into a CNN (used as feat extractor) to get a 4096 feature vector
This vector passed to an SVM classifier for presence of the object in that candidate region; plus to predict 4 offset values for bbs (help if presence of object has been detected but object is only half in the bb)

Drwabacks

slow: have to do same for 2000 region proposals
selective search is used as is, so no control on region proposals quality

Fast R-CNN

Same author as R-CNN, meant to improve on speed
Instead of feeding regions to a CNN, input image fed to CNN to generate convolutional feature map
Regions identified from the conv feat map still via selective search, then resized to square
ROI pooling layer used to reshape regions into something to be fed into dense layer
Softmax layer used to predict class and offset bb values

Solves and still drawbacks

No have to feed 2000 regions to CNN, convolution done only once per image
Still uses selective search, which is slow and fixed - this makes it unuseable for real applications

Faster R-CNN

By ohter people
Same as fast R-CNN, passes full image to CNN for conv feat map
Does not use selective search for getting regions at this point, uses a region proposal network to predict them
Predicted regions reshaped via RoI pooling layer
RoI pooling layer then used to classify image within proposed region and predict offset values for bbs

Solves

No use of selective search make it usable for real applications

this from 9

SSD

Single-shot Multibox detector.

Nov 2016
record fast when out
Single-shot: one object localisation and classification pass
Multibox (see below) is name of the method for bounding box regression, by some of same authors
architecture built on VGG-16 and removes dense layers, replacing them with conv layers (to extract feats at multiple scales)
The VGG-16 phase is the most time-consuming one
in SSD, every feat map cell is associated with default bounding boxes of different dimentions and aspect ratios, unlike multibox >TODO ???
priors are manually chosen, without the pre-training phase for priors
Uses L1-Norm as location loss

Multibox

fast method for candidate bb
uses inception-like CNN
Uses categorical cross-entropy loss for confidence of detected object being object
Plus uses L2-Norm loss for localisation loss, for overlap of detected boxes to ground truth ones
the two losses are combined
it starts with priors for the anchors, uses the IoU metric to select the predicted boxes that overlap enough with the ground truth
note that Multibox does not do object classification

TODO the fact that SSD and Multibox are two different algorithms isn't clear? Also if they're different, how is it that SSD includes Multibox in the name?

Hard negative mining

Most detection at training time won't be good (low IoU), interpreted as negative training samples. They're needed to teach the model what is a bad detection, but there's a lot so it's good to set a ratio of negative to positive, set at 3:1.

Non-maximum suppression

technique to prune boxes generated at training time, to reduce noise
boxes with confidence score less than x and IoU less that y are pruned

TODOs

YOLO

TODO see what to put here and how to separate in the other notebook

R-FCN

YOLO not implemented in Tensorflow

References

1. https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e
1. R-CNN original paper https://arxiv.org/pdf/1311.2524.pdf
1. Selective search, a paper (original source?) https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013/UijlingsIJCV2013.pdf
1. Faster R-CNN original paper https://arxiv.org/pdf/1506.01497.pdf
1. Kharpathy on trying R-CNN back in the day :D https://cs.stanford.edu/people/karpathy/rcnn/
1. Overfeat original paper https://arxiv.org/pdf/1312.6229v4.pdf
1. R-FCN original paper https://arxiv.org/abs/1605.06409
1. SSD original paper https://arxiv.org/abs/1512.02325
1. https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-in-deep-learning-495ef744fab
1. Multibox original paper https://arxiv.org/abs/1412.1441
References for YOLO

The official website https://pjreddie.com/darknet/yolo/ which refers to the C open source package (darknet)
The BBOX tool for training on your own stuff https://github.com/martinapugliese/BBox-Label-Tool
Blogs (by creator of the above?) about training your own https://medium.com/@manivannan_data/how-to-train-yolov2-to-detect-custom-objects-9010df784f36 and https://medium.com/@manivannan_data/how-to-train-multiple-objects-in-yolov2-using-your-own-dataset-2b4fee898f17
A tutorial about using it from opencv https://www.arunponnusamy.com/yolo-object-detection-opencv-python.html
The SSD paper https://arxiv.org/abs/1512.02325
This other package that ports darknet to tensorflo (darkflow)
This guy uses a camera on a Raspberry Pi to detect birds https://www.makeartwithpython.com/blog/poor-mans-deep-learning-camera/

martinapugliese / tales-science-data

Additions to networks for object detection #212