ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.79k stars 16.36k forks source link

How to add another detection head ? #1418

Closed Edwardmark closed 3 years ago

Edwardmark commented 3 years ago

❔Question

In yolo5s.yaml, there is only three detection layer [P3/8, P4/16, P5/32], how to add another layer with scale 64 to detect really big objects?

Additional context

Could you please kindly give me some guide? Thanks. @glenn-jocher

glenn-jocher commented 3 years ago

@Edwardmark design modifications are up to you. Start from the existing yamls and modify as you see fit. https://github.com/ultralytics/yolov5/tree/master/models

JoJoliking commented 3 years ago

@Edwardmark Have you solve the problem ? I also want to add and modify the detection head. And now I can't find the location of the detection head 's code

Edwardmark commented 3 years ago

@glenn-jocher Could you please explain the parameters in yolov5l.yaml a little, let's say, if we want to add a head which aims to detect large objects (e.g. 640x640 big objects), what should be added to anchor and backbone and head in yolov5l.yaml? Thanks. The model definition is hard to understand for me, please help me out, thanks in advance.

glenn-jocher commented 3 years ago

@Edwardmark @JoJoliking sure no problem. The current models output P3-P5 layers supporting strides 8-32. You want to export a P6 layer with stride 64.

You can export from any layer of the model you want simply by adding it to the input list of Detect(). This is one of the major advancements we made in YOLOv5 above and beyond the previous cfg architectures: https://github.com/ultralytics/yolov5/blob/199c9c787427ece5723d5309e1c7c524a99bc59d/models/yolov5s.yaml#L47

So all you need to do is build the additional structure you want and then add the output layer you want to this list. You could then add another set of P6/64 anchors manually to the model, or you could simply delete the manual anchors and put a number instead, like anchors: 3 to tell the model to compute 3 of it's own anchors at each output. https://github.com/ultralytics/yolov5/blob/199c9c787427ece5723d5309e1c7c524a99bc59d/models/yolov5s.yaml#L7-L11

To build the additional structure, you can simply repeat the steps from P4 to P5: https://github.com/ultralytics/yolov5/blob/199c9c787427ece5723d5309e1c7c524a99bc59d/models/yolov5s.yaml#L38-L47

In terms of P6, there's no 64-stride layers earlier to concat, so you could simply do something like this for the easiest P6/64 output. If you wanted to get fancier you could have the backbone travel down to P6/64, and then concat that layer with the head (same as P5 is handled).

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)

   [-1, 1, Conv, [1024, 3, 2]],
   [-1, 3, BottleneckCSP, [2048, False]],  # 25 (P6/64-xlarge)

   [[17, 20, 23, 25], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)
glenn-jocher commented 3 years ago

By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).

If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.

JoJoliking commented 3 years ago

@glenn-jocher Excellent answer. For different yaml. 5s, 5l and 5x all use YOLOv5 head. For 5-fpn and 5-panet, it corresponds to fpn head and panet head. These three head structures should be different. There is no definition of fpn and panet under common.py. At the same time, it seems that fpn and panet are not generated in YOLOv5head (I don’t know if I missed it) right?

glenn-jocher commented 3 years ago

@JoJoliking the four YOLOv5 models s/m/l/x are all built from yolov5-panet.yaml with different compound scaling constants. I experimented to find the best constants ratio, starting from the EfficientDet scaling equations, and these are used now for the four sizes.

FPN heads (like in YOLOv3) perform worse and are no longer used, though yolov5-fpn.yaml is archived for historical reasons (and to show how to modify the head structure from FPN to PANet).

glenn-jocher commented 3 years ago

Also common.py and experimental.py define low level modules that are used to create FPN or PANet heads. The heads themselves are only created and defined in the yamls.

JoJoliking commented 3 years ago

@glenn-jocher All right。 I already know the relationship between the network structure。 By the way,If I want to increase the output of a network dimension, such as 4 offsets of the bounding box, so which places should I modify? How should I add a convolutional layer branch to each of the three detection heads to achieve this? I tried to modify the yaml and Detect functions, but failed. Forgive me for not having a deep understanding of the code. Sorry.

glenn-jocher commented 3 years ago

@JoJoliking I don't understand what you are asking.

JoJoliking commented 3 years ago

@glenn-jocher Sorry. I think I should describe my problem more clearly. The current detection output is 85 containing the prediction probability of the 80 category, the xywh of the bounding box and the classification score, right? My idea : Do not change the existing output, while adding a four-dimensional output. They are the corresponding offset of xywh respectively. (Use fully connected layer or convolutional layer to achieve)

glenn-jocher commented 3 years ago

The three n-to-255 convolutions are contained inside the Detect() layer, you can apply any modifications you want there.

Though applying offsets/gains to the existing offsets and gains may overdetermine some of the parameters. ie fitting two offsets for one value is not typical in parameter estimation as there is only 1 degree of freedom there.

Edwardmark commented 3 years ago

@glenn-jocher Thanks for your kind reply. It helps me a lot. Best, Edward.

mary-0830 commented 3 years ago

Hello, I would like to ask me to add some anchor box parameters after the anchor attribute in yolov5s.yaml, but an overflow error will be displayed.

Excuse me, is this not allowed? Or is there anything I haven’t changed?

The parameters I added are like this. `anchors:

glenn-jocher commented 3 years ago

@mary-0830 you're free to modify anchors as you see fit. The only constraint is each output layer requires the same number of anchors.

If autoanchor doesn't like your new anchors, it will create new ones on it's own, based on the number you supplied initially. You can disable autoanchor with python train.py --noautoanchor.

You can also simply specify a number here instead of anchor vectors: anchors: 3

Edwardmark commented 3 years ago

@glenn-jocher If I add head, what shold I modified in the compute_loss funcition? How to set balance? in compute_loss function? image Thanks.

glenn-jocher commented 3 years ago

@Edwardmark modifications are up to you.

JoJoliking commented 3 years ago

@glenn-jocher Hallow, If I want to load multiple data sets for training at the same time (they will be placed in the same subfolder), then how should I modify the LoadImagesAndLabels function?

glenn-jocher commented 3 years ago

@JoJoliking coco128.yaml already explains how to load multiple datasets. Do not modify the code. https://github.com/ultralytics/yolov5/blob/97a5227a1a13f59dce4b896e40d411c13fbdb7b3/data/coco128.yaml#L12-L15

JoJoliking commented 3 years ago

OK .I will have a try. Thank you for your previous reply to my question. I have a success.

JoJoliking commented 3 years ago

@glenn-jocher Hallow. Dear YOLO5 Author! If I only want YOLOv5 to recognize human, how should anchor size and ancho_t(default=4.0) be modified? Can you give me some advice ?

glenn-jocher commented 3 years ago

@JoJoliking I would recommend training with all default settings (no modification). To start see: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

JoJoliking commented 3 years ago

@glenn-jocher OK. Thanks .I will try my ideas.

glenn-jocher commented 3 years ago

@JoJoliking ok! Also remember COCO models already offer human detection. You can also filter detections by class to only show human detections like this, so in reality I would not even train a new model if all you want is human detection:

python detect.py --classes 0
JoJoliking commented 3 years ago

@glenn-jocher Yes. Dear glenn-jocher In fact, I will use other human datasets for training. This type human datasets only have one class (zeros for human). At the same time, I notice tha the cls loss is always zeors. I think this is normal, because network only need to distinguish background and person. Right ?

glenn-jocher commented 3 years ago

@JoJoliking yes, this is normal. Single-class datasets do not have any classification loss as there is no classification task, only objectness loss.

WANGCHAO1996 commented 3 years ago

By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).

If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.

Hello author, I want to add a detection layer for detecting small targets. Now the latest code has been modified, how should I modify it@glenn-jocher

anchors

anchors:

YOLOv5 backbone

backbone:

[from, number, module, args]

[[-1, 1, Focus, [64, 3]], # 0-P1/2 [-1, 1, Conv, [128, 3, 2]], # 1-P2/4 [-1, 3, BottleneckCSP, [128]], [-1, 1, Conv, [256, 3, 2]], # 3-P3/8 [-1, 9, BottleneckCSP, [256]], [-1, 1, Conv, [512, 3, 2]], # 5-P4/16 [-1, 9, BottleneckCSP, [512]], [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 [-1, 1, SPP, [1024, [5, 9, 13]]], [-1, 3, BottleneckCSP, [1024, False]], # 9 ]

YOLOv5 head

head: [[-1, 1, Conv, [512, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 3, BottleneckCSP, [512, False]], # 13

[-1, 1, Conv, [256, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3 [-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)

[-1, 1, Conv, [256, 3, 2]], [[-1, 14], 1, Concat, [1]], # cat head P4 [-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)

[-1, 1, Conv, [512, 3, 2]], [[-1, 10], 1, Concat, [1]], # cat head P5 [-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) ]

glenn-jocher commented 3 years ago

@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4): https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

YukunXia commented 3 years ago

@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4): https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

@glenn-jocher

Should this line

https://github.com/ultralytics/yolov5/blob/b894e69dfc341fcbfe4a307a15d6af90d90367df/models/hub/yolov5-p2.yaml#L53

add 21 to the anchor list?

Besides, maybe the anchors should add one more line of size definition?

glenn-jocher commented 3 years ago

@YukunXia actually yes, maybe the Detect input list should probably include 21 as well. You can train either way, with the current setup using the same output strides, but with the PANet head dipping down into P2 stride convolutions to help add accuracy to the P3 output.

It's been a while since I made this so I can't remember if the 21 omission is intentional or not.

Can you submit a PR with the 21 addition to this yaml? Thanks!

YukunXia commented 3 years ago

OK, the PR is submitted.

https://github.com/ultralytics/yolov5/pull/4608/commits/1db6554cf2f85b6a364916a51716a836fb4d1a78

JoJoliking commented 3 years ago

@glenn-jocher Dear Author. By the way. Have you tried to use BiFPN in Yolov5 instead of PANet ? My experiments show that BiFPN with stacking 3 Layers can reach a better map on other datasets!

xengst commented 2 years ago

@glenn-jocher If I want to add another model e.g VGG16 to the backbone, what is the right way to do that?

glenn-jocher commented 2 years ago

@xengst you could try to create backbone modifications in a yaml file, though be aware that the head and backbone are connected at many different places by shortcut connections and not just at the end of the backbone.

xengst commented 2 years ago

@glenn-jocher Thank your for your reply.

So I should define all VGG16 layer as class first in common.py then add it to model.yaml?

glenn-jocher commented 2 years ago

@xengst yes. Remember the head needs skip connections from P3, P4, P5 (layers 6, 4 and 10 here):

https://github.com/ultralytics/yolov5/blob/5185981993737861575adb07f2817a74fa4b2baa/models/yolov5s.yaml#L27-L48

xengst commented 2 years ago

@glenn-jocher So what i am trying to do it wouldn't work ? if I want to test different head and backbone is not possible?

glenn-jocher commented 2 years ago

@xengst how would I know if your experiment will 'work' or not?

myasser63 commented 2 years ago

@glenn-jocher Can I directly add a layer to Detect from the backbone?

glenn-jocher commented 2 years ago

@myasser63 yes Detect can accept inputs from any part of the model. If you update Detect inputs you probably also want to set anchors: 3. This tells AutoAnchor to evolve 3 anchors for each Detect input.

myasser63 commented 2 years ago

Thanks @glenn-jocher four your explanation

myasser63 commented 2 years ago

@glenn-jocher I want to understant the concept behind choosing ch_outfor head Conv layer. Is it through testing or there is a relation with concatinated layers.


head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]