qqwweee / keras-yolo3

A Keras implementation of YOLOv3 (Tensorflow backend)
MIT License
7.14k stars 3.45k forks source link

How to change number of anchors? Is that possible? #428

Open canyilmaz90 opened 5 years ago

canyilmaz90 commented 5 years ago

Hello everyone, I want to change the number of anchors for model, eg. 10 anchor boxes for tiny-yolo and 15 anchor boxes for yolo. Is that possible and how can I do it? I've tried to modify some lines of the codes, but I always got shape error.

abhay8051 commented 5 years ago

Not possible as the number of anchor boxes is designed as per the architecture ot the CNN, 9 for yolo3. In yolo v3 detection happens at 3 stages of increasing scale / size and at each scale 3 boxes are predicted, hence 3x3= 9 boxes. That's why when you change the number of anchors tou get a size mismatch error.

Just curious, why do you want to change the number of anchor boxes?

lcltopismine commented 5 years ago

Not possible as the number of anchor boxes is designed as per the architecture ot the CNN, 9 for yolo3. In yolo v3 detection happens at 3 stages of increasing scale / size and at each scale 3 boxes are predicted, hence 3x3= 9 boxes. That's why when you change the number of anchors tou get a size mismatch error.

Just curious, why do you want to change the number of anchor boxes?

What if I want to design nn for multiple(>3) resolution? As far as I consider, there should be 3 scales in yolov3, can we use more scales?

canyilmaz90 commented 5 years ago

Hi @abhay8051 , thanks for your reply. I know that yolov3 has 3 stages, but cannot we redesign it as 5 boxes for each stages (5x3) instead of (3x3)?

For your question: I'm trying to use tiny-yolo for sake of its speed and it has 6 anchor boxes, unlike yolov3. It looked not enough to generalize all the scales of objects in the dataset. So, I wanted to add some more anchor boxes.

lcltopismine commented 5 years ago

Hi @abhay8051 , thanks for your reply. I know that yolov3 has 3 stages, but cannot we redesign it as 5 boxes for each stages (5x3) instead of (3x3)?

For your question: I'm trying to use tiny-yolo for sake of its speed and it has 6 anchor boxes, unlike yolov3. It looked not enough to generalize all the scales of objects in the dataset. So, I wanted to add some more anchor boxes.

This is possible. For scale, you need to propose resolutions in the yolo_body and merge them. In yolo_body we have 3 output, add more output for more possible scale.

For anchors, increase the number of anchors accordingly.However, you may need to start to change everything( at least code from kmean.py for number of clusters. Read yolo 2 for this part).

abhay8051 commented 5 years ago

@canyilmaz90 @lcltopismine

Yes its possible, but not with this implementation. You will have to redesign the architecture of the CNN and then YOLO Filtering code. Once you do that the pretrained weights here will be of no help and also you will have to rewrite / modify the other parts of the code.

When you convert the model frm Draknet to Keras , you will see a summary of the CNN layers. Thats where you will have to the make the changes to the filters / kernals to change output of that respective layer.

Think of this as a mass produced 2 wheeled bicycle, if you want to add 2 more wheels,you will have to build a new custom frame or modify an existing. And you just cannot add the wheels, along with that you will have to add chain and pedals and figure out other things. qqwweee is kind enough to let us use this for free.

Regarding the need for identifying smaller objects or objects at different scales. increasing the anchor boxes will not help,heres why in YoloV3 the sales are 1st 13x13 2nd 26x26 3rd 52x52 So the algorithm is looking at 169 cells in 1st, 676 and 2704 in the 2nd and 3rd respectively. which improves the resolution at each scale. Anchor boxes are just a starting position or template bounding boxes to begin with. The purpose of Anchor boxes are to facilitate detection of objects with different aspect ratio and size (eg car and a standing person, car is horizontal and a person is vertical). Each cell in the 3 scales will predict objects for n anchor boxes (3 in this case) of x classes. The prediction will be 4 Bounding box parameters and class (x,y,w,y,class). so for each cell the predictions will be 13x13x(3x(5+80)) for COCO dataset 3= bounding boxes, 5 = (x,y,w,y,object confidence) , 80 classes

canyilmaz90 commented 5 years ago

@lcltopismine and @abhay8051 thank you both for your detailed explanations. those improved my insight for yolo architecture

canyilmaz90 commented 5 years ago

Btw, I have just noticed a code line of yolo_loss function in model.py:

anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]

Is the final [1,2,3] correct, or should it be [0,1,2] ?

lcltopismine commented 5 years ago

Btw, I have just noticed a code line of yolo_loss function in model.py:

anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]

Is the final [1,2,3] correct, or should it be [0,1,2] ?

@canyilmaz90 I agree with you, I prefer to [0,1,2]