hollance / YOLO-CoreML-MPSNNGraph

Tiny YOLO for iOS implemented using CoreML but also using the new MPS graph API.
MIT License
929 stars 251 forks source link

Question: Anchors array #27

Closed izdi closed 6 years ago

izdi commented 6 years ago

Hey, first of all thanks for amazing explanation and source code, really helpful to understand the workflow of such applications!

I could not find any explanation in your related blog post or in code to this variable: let anchors: [Float] = [1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52]

Where are these numbers come from ? Thanks in advance!

hollance commented 6 years ago

These are the widths and heights of 5 "anchor" boxes (also known as prior boxes or default boxes). Recall that for every cell in the 13 x 13 grid we predict 5 bounding boxes? What we predict are actually widths and heights relative to these anchor boxes.

The reason YOLO uses these anchor boxes is as a hint to the neural network that most of the predictions will have one of these shapes. These particular anchor boxes were chosen by the authors of YOLO using clustering on the Pascal VOC dataset to find the most common object shapes. Using such anchor boxes is common in many object detection models.

izdi commented 6 years ago

Is this constant value or somehow calculated? If the last then are you familiar with approaches to do so ?

hollance commented 6 years ago

It's constant for the training dataset used, in this case Pascal VOC. These anchors were calculated just once, before YOLO was trained. (The YOLO9000 paper explains how they did this in more detail.)

If you're training YOLO on a very different dataset with different shaped objects, you may need to calculate different anchors.

Turi Create, for example, also uses YOLO for object detection but has more and different anchors, presumably to deal with a larger variety of objects.

izdi commented 6 years ago

Thanks, this is very useful information !