Closed Edwardmark closed 3 years ago
@Edwardmark design modifications are up to you. Start from the existing yamls and modify as you see fit. https://github.com/ultralytics/yolov5/tree/master/models
@Edwardmark Have you solve the problem ? I also want to add and modify the detection head. And now I can't find the location of the detection head 's code
@glenn-jocher Could you please explain the parameters in yolov5l.yaml a little, let's say, if we want to add a head which aims to detect large objects (e.g. 640x640 big objects), what should be added to anchor and backbone and head in yolov5l.yaml? Thanks. The model definition is hard to understand for me, please help me out, thanks in advance.
@Edwardmark @JoJoliking sure no problem. The current models output P3-P5 layers supporting strides 8-32. You want to export a P6 layer with stride 64.
You can export from any layer of the model you want simply by adding it to the input list of Detect(). This is one of the major advancements we made in YOLOv5 above and beyond the previous cfg architectures: https://github.com/ultralytics/yolov5/blob/199c9c787427ece5723d5309e1c7c524a99bc59d/models/yolov5s.yaml#L47
So all you need to do is build the additional structure you want and then add the output layer you want to this list. You could then add another set of P6/64 anchors manually to the model, or you could simply delete the manual anchors and put a number instead, like anchors: 3
to tell the model to compute 3 of it's own anchors at each output.
https://github.com/ultralytics/yolov5/blob/199c9c787427ece5723d5309e1c7c524a99bc59d/models/yolov5s.yaml#L7-L11
To build the additional structure, you can simply repeat the steps from P4 to P5: https://github.com/ultralytics/yolov5/blob/199c9c787427ece5723d5309e1c7c524a99bc59d/models/yolov5s.yaml#L38-L47
In terms of P6, there's no 64-stride layers earlier to concat, so you could simply do something like this for the easiest P6/64 output. If you wanted to get fancier you could have the backbone travel down to P6/64, and then concat that layer with the head (same as P5 is handled).
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
[-1, 1, Conv, [1024, 3, 2]],
[-1, 3, BottleneckCSP, [2048, False]], # 25 (P6/64-xlarge)
[[17, 20, 23, 25], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5, P6)
By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).
If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.
@glenn-jocher Excellent answer. For different yaml. 5s, 5l and 5x all use YOLOv5 head. For 5-fpn and 5-panet, it corresponds to fpn head and panet head. These three head structures should be different. There is no definition of fpn and panet under common.py. At the same time, it seems that fpn and panet are not generated in YOLOv5head (I don’t know if I missed it) right?
@JoJoliking the four YOLOv5 models s/m/l/x are all built from yolov5-panet.yaml with different compound scaling constants. I experimented to find the best constants ratio, starting from the EfficientDet scaling equations, and these are used now for the four sizes.
FPN heads (like in YOLOv3) perform worse and are no longer used, though yolov5-fpn.yaml is archived for historical reasons (and to show how to modify the head structure from FPN to PANet).
Also common.py and experimental.py define low level modules that are used to create FPN or PANet heads. The heads themselves are only created and defined in the yamls.
@glenn-jocher All right。 I already know the relationship between the network structure。 By the way,If I want to increase the output of a network dimension, such as 4 offsets of the bounding box, so which places should I modify? How should I add a convolutional layer branch to each of the three detection heads to achieve this? I tried to modify the yaml and Detect functions, but failed. Forgive me for not having a deep understanding of the code. Sorry.
@JoJoliking I don't understand what you are asking.
@glenn-jocher Sorry. I think I should describe my problem more clearly. The current detection output is 85 containing the prediction probability of the 80 category, the xywh of the bounding box and the classification score, right? My idea : Do not change the existing output, while adding a four-dimensional output. They are the corresponding offset of xywh respectively. (Use fully connected layer or convolutional layer to achieve)
The three n-to-255 convolutions are contained inside the Detect() layer, you can apply any modifications you want there.
Though applying offsets/gains to the existing offsets and gains may overdetermine some of the parameters. ie fitting two offsets for one value is not typical in parameter estimation as there is only 1 degree of freedom there.
@glenn-jocher Thanks for your kind reply. It helps me a lot. Best, Edward.
Hello, I would like to ask me to add some anchor box parameters after the anchor attribute in yolov5s.yaml, but an overflow error will be displayed.
Excuse me, is this not allowed? Or is there anything I haven’t changed?
The parameters I added are like this. `anchors:
@mary-0830 you're free to modify anchors as you see fit. The only constraint is each output layer requires the same number of anchors.
If autoanchor doesn't like your new anchors, it will create new ones on it's own, based on the number you supplied initially. You can disable autoanchor with python train.py --noautoanchor.
You can also simply specify a number here instead of anchor vectors:
anchors: 3
@glenn-jocher If I add head, what shold I modified in the compute_loss funcition? How to set balance? in compute_loss function? Thanks.
@Edwardmark modifications are up to you.
@glenn-jocher Hallow, If I want to load multiple data sets for training at the same time (they will be placed in the same subfolder), then how should I modify the LoadImagesAndLabels function?
@JoJoliking coco128.yaml already explains how to load multiple datasets. Do not modify the code. https://github.com/ultralytics/yolov5/blob/97a5227a1a13f59dce4b896e40d411c13fbdb7b3/data/coco128.yaml#L12-L15
OK .I will have a try. Thank you for your previous reply to my question. I have a success.
@glenn-jocher Hallow. Dear YOLO5 Author! If I only want YOLOv5 to recognize human, how should anchor size and ancho_t(default=4.0) be modified? Can you give me some advice ?
@JoJoliking I would recommend training with all default settings (no modification). To start see: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
@glenn-jocher OK. Thanks .I will try my ideas.
@JoJoliking ok! Also remember COCO models already offer human detection. You can also filter detections by class to only show human detections like this, so in reality I would not even train a new model if all you want is human detection:
python detect.py --classes 0
@glenn-jocher Yes. Dear glenn-jocher In fact, I will use other human datasets for training. This type human datasets only have one class (zeros for human). At the same time, I notice tha the cls loss is always zeors. I think this is normal, because network only need to distinguish background and person. Right ?
@JoJoliking yes, this is normal. Single-class datasets do not have any classification loss as there is no classification task, only objectness loss.
By the way, you should be aware that P6 outputs will mainly benefit larger image sizes. So you are travelling down a road of a larger models applied to larger images (i.e. longer training, more CUDA memory usage, etc.).
If you wanted to go the other way and create models that work better on smaller images you might output a P2/4 (stride 4) layer instead. P2 output layers incur minimal size increases, but many more FLOPS as the convolutions are applied over larger denser grids, slowing inference significantly.
Hello author, I want to add a detection layer for detecting small targets. Now the latest code has been modified, how should I modify it@glenn-jocher
anchors:
backbone:
[[-1, 1, Focus, [64, 3]], # 0-P1/2 [-1, 1, Conv, [128, 3, 2]], # 1-P2/4 [-1, 3, BottleneckCSP, [128]], [-1, 1, Conv, [256, 3, 2]], # 3-P3/8 [-1, 9, BottleneckCSP, [256]], [-1, 1, Conv, [512, 3, 2]], # 5-P4/16 [-1, 9, BottleneckCSP, [512]], [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 [-1, 1, SPP, [1024, [5, 9, 13]]], [-1, 3, BottleneckCSP, [1024, False]], # 9 ]
head: [[-1, 1, Conv, [512, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 3, BottleneckCSP, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]], [-1, 1, nn.Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3 [-1, 3, BottleneckCSP, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]], [[-1, 14], 1, Concat, [1]], # cat head P4 [-1, 3, BottleneckCSP, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]], [[-1, 10], 1, Concat, [1]], # cat head P5 [-1, 3, BottleneckCSP, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) ]
@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4): https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml
@WANGCHAO1996 YOLOv5-p2 adds an extra small detection head (P2, stride 4): https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml
@glenn-jocher
Should this line
add 21 to the anchor list?
Besides, maybe the anchors
should add one more line of size definition?
@YukunXia actually yes, maybe the Detect input list should probably include 21 as well. You can train either way, with the current setup using the same output strides, but with the PANet head dipping down into P2 stride convolutions to help add accuracy to the P3 output.
It's been a while since I made this so I can't remember if the 21 omission is intentional or not.
Can you submit a PR with the 21 addition to this yaml? Thanks!
OK, the PR is submitted.
https://github.com/ultralytics/yolov5/pull/4608/commits/1db6554cf2f85b6a364916a51716a836fb4d1a78
@glenn-jocher Dear Author. By the way. Have you tried to use BiFPN in Yolov5 instead of PANet ? My experiments show that BiFPN with stacking 3 Layers can reach a better map on other datasets!
@glenn-jocher If I want to add another model e.g VGG16 to the backbone, what is the right way to do that?
@xengst you could try to create backbone modifications in a yaml file, though be aware that the head and backbone are connected at many different places by shortcut connections and not just at the end of the backbone.
@glenn-jocher Thank your for your reply.
So I should define all VGG16 layer as class first in common.py
then add it to model.yaml?
@xengst yes. Remember the head needs skip connections from P3, P4, P5 (layers 6, 4 and 10 here):
@glenn-jocher So what i am trying to do it wouldn't work ? if I want to test different head and backbone is not possible?
@xengst how would I know if your experiment will 'work' or not?
@glenn-jocher Can I directly add a layer to Detect
from the backbone?
@myasser63 yes Detect can accept inputs from any part of the model. If you update Detect inputs you probably also want to set anchors: 3
. This tells AutoAnchor to evolve 3 anchors for each Detect input.
Thanks @glenn-jocher four your explanation
@glenn-jocher I want to understant the concept behind choosing ch_out
for head Conv layer
. Is it through testing or there is a relation with concatinated layers.
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
❔Question
In yolo5s.yaml, there is only three detection layer [P3/8, P4/16, P5/32], how to add another layer with scale 64 to detect really big objects?
Additional context
Could you please kindly give me some guide? Thanks. @glenn-jocher