ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
51.17k stars 16.43k forks source link

YOLOv5 (6.0/6.1) brief summary #6998

Open WZMIAOMIAO opened 2 years ago

WZMIAOMIAO commented 2 years ago

Content

1. Model Structure

YOLOv5 (v6.0/6.1) consists of:

Model structure (yolov5l.yaml):

yolov5

Some minor changes compared to previous versions:

  1. Replace the Focus structure with 6x6 Conv2d(more efficient, refer #4825)
  2. Replace the SPP structure with SPPF(more than double the speed)
test code ```python import time import torch import torch.nn as nn class SPP(nn.Module): def __init__(self): super().__init__() self.maxpool1 = nn.MaxPool2d(5, 1, padding=2) self.maxpool2 = nn.MaxPool2d(9, 1, padding=4) self.maxpool3 = nn.MaxPool2d(13, 1, padding=6) def forward(self, x): o1 = self.maxpool1(x) o2 = self.maxpool2(x) o3 = self.maxpool3(x) return torch.cat([x, o1, o2, o3], dim=1) class SPPF(nn.Module): def __init__(self): super().__init__() self.maxpool = nn.MaxPool2d(5, 1, padding=2) def forward(self, x): o1 = self.maxpool(x) o2 = self.maxpool(o1) o3 = self.maxpool(o2) return torch.cat([x, o1, o2, o3], dim=1) def main(): input_tensor = torch.rand(8, 32, 16, 16) spp = SPP() sppf = SPPF() output1 = spp(input_tensor) output2 = sppf(input_tensor) print(torch.equal(output1, output2)) t_start = time.time() for _ in range(100): spp(input_tensor) print(f"spp time: {time.time() - t_start}") t_start = time.time() for _ in range(100): sppf(input_tensor) print(f"sppf time: {time.time() - t_start}") if __name__ == '__main__': main() ``` result: ``` True spp time: 0.5373051166534424 sppf time: 0.20780706405639648 ```

2. Data Augmentation

3. Training Strategies

4. Others

4.1 Compute Losses

The YOLOv5 loss consists of three parts:

loss

4.2 Balance Losses

The objectness losses of the three prediction layers(P3, P4, P5) are weighted differently. The balance weights are [4.0, 1.0, 0.4] respectively.

obj_loss

4.3 Eliminate Grid Sensitivity

In YOLOv2 and YOLOv3, the formula for calculating the predicted target information is:

b_x
b_y
b_w
b_h

In YOLOv5, the formula is:

bx+c_x)
by+c_y)
bw^2)
bh^2)

Compare the center point offset before and after scaling. The center point offset range is adjusted from (0, 1) to (-0.5, 1.5). Therefore, offset can easily get 0 or 1.

Compare the height and width scaling ratio(relative to anchor) before and after adjustment. The original yolo/darknet box equations have a serious flaw. Width and Height are completely unbounded as they are simply out=exp(in), which is dangerous, as it can lead to runaway gradients, instabilities, NaN losses and ultimately a complete loss of training. refer this issue

4.4 Build Targets

Match positive samples:

rw

rh

rwmax

rhmax

rmax

match

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

WZMIAOMIAO commented 2 years ago

@glenn-jocher hi, today I briefly summarized yolov5(v6.0). Please help to see if there are any problems or put forward better suggestions. Some schematic diagrams or contents will be added later. Thank you for your great work.

zlj-ky commented 2 years ago

hi, 'prediction layers(P3, P4, P5) are weighted differently', how do I find it in the code, and further, modify it?

WZMIAOMIAO commented 2 years ago

hi, 'prediction layers(P3, P4, P5) are weighted differently', how do I find it in the code, and further, modify it?

https://github.com/ultralytics/yolov5/blob/c09fb2aa95b6ca86c460aa106e2308805649feb9/utils/loss.py#L111 and

https://github.com/ultralytics/yolov5/blob/c09fb2aa95b6ca86c460aa106e2308805649feb9/utils/loss.py#L156

zlj-ky commented 2 years ago

@WZMIAOMIAO thx!

glenn-jocher commented 2 years ago

@WZMIAOMIAO awesome summary, nice work!

@zlj-ky yes the balancing parameters are there, we tuned these manually on COCO. The idea is to balance losses from each layer (just like we balance losses across loss components (box, obj, class)). The reason I didn't turn these into learnable weights is that as absolute values the gradient would always want to drag them to zero to minimize the loss. I suppose we could constantly normalize them so they all sum to 1 to avoid this effect. Might be an interesting experiment, and this might help the balancing adapt better to different datasets and image sizes etc.

WZMIAOMIAO commented 2 years ago

@glenn-jocher Could we add this brief summary to the document?

glenn-jocher commented 2 years ago

@WZMIAOMIAO yes maybe it's a good idea to document this somewhere. Which document do you mean though?

WZMIAOMIAO commented 2 years ago

@glenn-jocher I think it could be added to the Tutorials. What do you think?

glenn-jocher commented 2 years ago

@WZMIAOMIAO all done in #7146! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

glenn-jocher commented 2 years ago

@HERIUN built_targets() implements an anchor-label assignment strategy so we can calculate the losses between assigned anchor-label pairs.

xinxin342 commented 2 years ago

@glenn-jocher what's the adjustment strategy for the balancing parameters?How to change them to learnable weights?

@WZMIAOMIAO awesome summary, nice work!

@zlj-ky yes the balancing parameters are there, we tuned these manually on COCO. The idea is to balance losses from each layer (just like we balance losses across loss components (box, obj, class)). The reason I didn't turn these into learnable weights is that as absolute values the gradient would always want to drag them to zero to minimize the loss. I suppose we could constantly normalize them so they all sum to 1 to avoid this effect. Might be an interesting experiment, and this might help the balancing adapt better to different datasets and image sizes etc.

@glenn-jocher what's the adjustment strategy for the balancing parameters?How to change them to learnable weights?

glenn-jocher commented 2 years ago

@xinxin342 the balance params are here, you'd have to convert them to nn.Parameter types assigned to an existing class and set their compute grad to True:

https://github.com/ultralytics/yolov5/blob/c9a3b14a749edf77e2faf7ad41f5cd779bd106fd/utils/loss.py#L112

zlj-ky commented 2 years ago

@xinxin342 the balance params are here, you'd have to convert them to nn.Parameter types assigned to an existing class and set their compute grad to True:

https://github.com/ultralytics/yolov5/blob/c9a3b14a749edf77e2faf7ad41f5cd779bd106fd/utils/loss.py#L112

@glenn-jocher I try to convert the weight to a learnable parameter like this(Limited by my limited experience) 图片 However, this parameter was not updated during training, I don't know why or how to revise my method. Can you teach me, even though it's a very simple question

glenn-jocher commented 2 years ago

@zlj-ky that seems like a good approach, but you might need to place self.w inside the model so it's affected by model.train(), model.eval(), etc. You can just place it inside models.yolo.Detect and then access it like this. (Note your code is out of date):

class ComputeLoss:
    sort_obj_iou = False

    def __init__(self, model, autobalance=False):
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters

        # Define criteria
        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets

        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

        m = de_parallel(model).model[-1]  # Detect() module
        self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7
        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance
        self.na = m.na  # number of anchors
        self.nc = m.nc  # number of classes
        self.nl = m.nl  # number of layers
        self.anchors = m.anchors
        self.w = m.w  # <------------------------ NEW CODE 
        self.device = device

This might or might not work as I don't know if this will create a copy or access the Detect parameter.

Even if you get this to work though It's not clear that these are learnable parameters as I'm not sure if they can be correlated to the gradient directly, i.e. the optimizer seeks to reduce loss, so the rebalance may just weigh higher the lower loss components to reduce loss, which may not have the desired effect.

The same concept applies to anchors, which don't seem learnable either during training.

zlj-ky commented 2 years ago

@zlj-ky that seems like a good approach, but you might need to place self.w inside the model so it's affected by model.train(), model.eval(), etc. You can just place it inside models.yolo.Detect and then access it like this. (Note your code is out of date):

class ComputeLoss:
    sort_obj_iou = False

    def __init__(self, model, autobalance=False):
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters

        # Define criteria
        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets

        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

        m = de_parallel(model).model[-1]  # Detect() module
        self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7
        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance
        self.na = m.na  # number of anchors
        self.nc = m.nc  # number of classes
        self.nl = m.nl  # number of layers
        self.anchors = m.anchors
        self.w = m.w  # <------------------------ NEW CODE 
        self.device = device

This might or might not work as I don't know if this will create a copy or access the Detect parameter.

Even if you get this to work though It's not clear that these are learnable parameters as I'm not sure if they can be correlated to the gradient directly, i.e. the optimizer seeks to reduce loss, so the rebalance may just weigh higher the lower loss components to reduce loss, which may not have the desired effect.

The same concept applies to anchors, which don't seem learnable either during training.

@glenn-jocher Thank you for sharing your views on this matter and for your patient guidance. I will try it latter.

HERIUN commented 2 years ago

@HERIUN built_targets() implements an anchor-label assignment strategy so we can calculate the losses between assigned anchor-label pairs. image image

I can't match from code to explaining figure... where c_x, c_y are in code?? and during calculating pwh in code.. why anchor[i] is p_w,h ??

WZMIAOMIAO commented 2 years ago

@HERIUN built_targets() implements an anchor-label assignment strategy so we can calculate the losses between assigned anchor-label pairs. image image

I can't match from code to explaining figure... where c_x, c_y are in code?? and during calculating pwh in code.. why anchor[i] is p_w,h ??

This figure shows the coordinate calculation formula of yolov2 and v3, not v5. For coordinate calculation, please refer to the following code: https://github.com/ultralytics/yolov5/blob/7926afccde1a95a4c8dbeb9d2b8a901d9f220ca7/models/yolo.py#L66-L72

If there is anything unclear, I suggest you check each variable through debug

isJunCheng commented 2 years ago

For the doubts about ‘grid-0.5’, I see many such problems, eg #6252, #471... Compared with the previous code(y[..., 0:2] *2 - 0.5 + grid), I found that the step of subtracting 0.5 was put into the calculation of grid; I don't quite understand why? Doesn't the mesh grid(i,j) exactly represent the top left corner vertex of the mesh in row I and column J? After subtracting 0.5, the center will move to the center of the upper left grid(i-1, J-1). We look forward to your reply

glenn-jocher commented 2 years ago

@isJunCheng grid computation now embeds offsets (after https://github.com/ultralytics/yolov5/pull/7262) to reduce FLOPs in detect.py and simplify export models. The change has no mathematical implications, the result is exactly the same as before.

isJunCheng commented 2 years ago

@isJunCheng grid computation now embeds offsets (after #7262) to reduce FLOPs in detect.py and simplify export models. The change has no mathematical implications, the result is exactly the same as before.

thank you for your reply. I haven't found an article that can make me understand. Can you explain it? After subtracting 0.5, where is the center of the anchor? The upper left corner of the (I, J) grid or the center of the (i-1, J-1) grid. I want to know where the anchor center is.

AnkushMalaker commented 2 years ago

@zlj-ky that seems like a good approach, but you might need to place self.w inside the model so it's affected by model.train(), model.eval(), etc. You can just place it inside models.yolo.Detect and then access it like this. (Note your code is out of date):

class ComputeLoss:
    sort_obj_iou = False

    def __init__(self, model, autobalance=False):
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters

        # Define criteria
        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets

        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

        m = de_parallel(model).model[-1]  # Detect() module
        self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7
        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance
        self.na = m.na  # number of anchors
        self.nc = m.nc  # number of classes
        self.nl = m.nl  # number of layers
        self.anchors = m.anchors
        self.w = m.w  # <------------------------ NEW CODE 
        self.device = device

This might or might not work as I don't know if this will create a copy or access the Detect parameter.

Even if you get this to work though It's not clear that these are learnable parameters as I'm not sure if they can be correlated to the gradient directly, i.e. the optimizer seeks to reduce loss, so the rebalance may just weigh higher the lower loss components to reduce loss, which may not have the desired effect.

The same concept applies to anchors, which don't seem learnable either during training.

Hey @glenn-jocher , I've been dealing with the issue of balancing losses in another project of mine. I feel that adding multiple losses and passing that loss to the Adam (or AdamW etc.) optimizer will not be able to optimize well. (Since the learning rate is adjusted for each parameter, Adam can't figure out which loss component has bigger effect. ) for example: loss1 = BCEWithLogitLoss(pred[0:2]) , target[0:2]) loss2 = MSE(pred[2:4]), target[2:4]) loss = loss1 + loss2 loss.backward() optimizer.step() More reference for the same : https://discuss.pytorch.org/t/how-are-optimizer-step-and-loss-backward-related/7350/14 The stackoverflow page the above post mentions : https://stackoverflow.com/questions/46774641/what-does-the-parameter-retain-graph-mean-in-the-variables-backward-method

There's something called MTAdam for the same. Are these considerations needed if I'm training on a dataset with just one tiny object per image and only one class in the dataset [without any pretraining]? (Assuming that the difference in losses would be massive, no-object loss would dominate in this case since we only have one object per image and the rest of the cells should predict no-object).

glenn-jocher commented 2 years ago

@AnkushMalaker you can find the objectness loss hyps here: https://github.com/ultralytics/yolov5/blob/d059d1da03aee9a3c0059895aa4c7c14b7f25a9e/data/hyps/hyp.scratch-low.yaml#L16-L17

In terms of balancing losses this has nothing to do with the amount of labels an image has, this balancing is across output layers P3-P6

carlossantos-iffar commented 2 years ago

@glenn-jocher Dear, I still don't quite understand what criteria are taken into account to define these weights: P3 (4.0), P4 (1.0) and P5 (0.4)? That is, how were these weights arrived at and what is the influence of these weights on the detection, for example, of small objects?

carlossantos-iffar commented 2 years ago

@glenn-jocher Another question I have is about the number of neurons and hidden layers in the network. How do I get this information?

glenn-jocher commented 2 years ago

@carlossantos-iffar the purpose is the balance the loss contributions from the difference outputs.

carlossantos-iffar commented 2 years ago

@carlossantos-iffar the purpose is the balance the loss contributions from the difference outputs.

Perfect! But my question is how did you arrive at these weight values? 4.0, 1.0 and 0.4?

glenn-jocher commented 2 years ago

@carlossantos-iffar from empirical observations of actual losses on default COCO trainings

carlossantos-iffar commented 2 years ago

@carlossantos-iffar from empirical observations of actual losses on default COCO trainings

Thanks!

suzijun commented 2 years ago

I would like to ask how can I change this function if my output layer has four layers

aa484 commented 2 years ago

The Balance Losses is objectness loss? Can you elaborate on the loss function? thank you.

AnkushMalaker commented 2 years ago

@glenn-jocher Sorry to ping you again on this thread, since there are comments discussing the summary/loss, thought this is the appropriate place. I saw in this comment that you switched to BCE loss for class classification instead of CE loss due to some epxeriments in YOLOv3. I tried to look for issues explaining why the change in YOLOv3 repository but couldn't find a lead. Could you elaborate or point me to where I could understand the reasoning?

In my understanding, currently we the class classification as a multi label problem. In a situation where we only have two classes that are binary (Say, class1: Fluffy cat. Class2: Slim cat) where we can never have both of them active at the same time, I should instead use CE loss, right?

ckyrkou commented 2 years ago

@AnkushMalaker you can find the objectness loss hyps here:

https://github.com/ultralytics/yolov5/blob/d059d1da03aee9a3c0059895aa4c7c14b7f25a9e/data/hyps/hyp.scratch-low.yaml#L16-L17

In terms of balancing losses this has nothing to do with the amount of labels an image has, this balancing is across output layers P3-P6

I do not understand why the positive and negative objectness values have the same weight. When I try it in my custom implementations the non-object values overwhelm the object values and it only works when I weight them separately and reduce the impact of non-objectness score as in the original YOLO paper that had separated objectnes and non-objectness scores.

Is there something that I am missing. Are you balancing them in another way?

ckyrkou commented 2 years ago

@carlossantos-iffar the purpose is the balance the loss contributions from the difference outputs.

Perfect! But my question is how did you arrive at these weight values? 4.0, 1.0 and 0.4?

The way I understand it is that the last detection layer has fewer output neurons than the higher resolution map. Since when we average the higher resolution map will be divided with a larger number it's influence is reduces. Hence, multiplying it with a larger number balances this. I usually use the factor of resolution as balancing weight. Hence, I use 1 for the lowes dim map, 4 for the medium, and 8 for the highest. This is explained as the medium having 4 times more output neurons and the high having 8 times as many neurons as the lowest detection layer. Hope this helps.

glenn-jocher commented 2 years ago

@ckyrkou P3-P6 layer output balancing is performed here: https://github.com/ultralytics/yolov5/blob/898332433a71b8846b15daa276a8ac45c9efa98b/utils/loss.py#L112

https://github.com/ultralytics/yolov5/blob/898332433a71b8846b15daa276a8ac45c9efa98b/utils/loss.py#L163-L164

There is no positive/negative balancing. You can choose to apply this yourself using the positive weight (pw) hyps: https://github.com/ultralytics/yolov5/blob/898332433a71b8846b15daa276a8ac45c9efa98b/data/hyps/hyp.scratch-low.yaml#L14-L17

ckyrkou commented 2 years ago

Yes I know that, I just mentioned the positive/negative as the difference from original YOLO. But intuitively if you have the same weight for object vs non-objects wouldn't this make the optimization tend to output values near zero since the majority of targets are zero. In which case the confidence threshold should be reduced right? I mention this because when I try it in my own implementation I do not get any output because the optimization leads to really small values for objectnes.

glenn-jocher commented 2 years ago

@ckyrkou I can't comment on your own implementation, but as a basic principle you might want to make sure that all loss components (box, obj, cls) per output layer P3-P6 are contributing equally if you believe they share equal responsibilities in the final prediction.

ckyrkou commented 2 years ago

@ckyrkou I can't comment on your own implementation, but as a basic principle you might want to make sure that all loss components (box, obj, cls) per output layer P3-P6 are contributing equally if you believe they share equal responsibilities in the final prediction.

Yes of course I do not expect you to comment on my implementation and I appreciate the intuitive explanations. With regards to the three losses (box, obj, cls) should I try to balance out their contribution at the beginning of the training. So if they start with values (box=5, obj=2, cls=10) should I scale them to be equal or wait to see what happens after a few epochs?

glenn-jocher commented 2 years ago

@ckyrkou initial results don't really matter too much other than you want a stable warmup strategy, the important part is the final values, so you should balance per the final/steady state losses. In most cases the two are not wildly different though, and you can probably iterate over a few trainings to a good solution. They don't need to match exactly, but also should not be an order of magnitude different probably.

ckyrkou commented 2 years ago

Yes I understand. I have been struggling with these balancing issues for some time. I am working on the 2012 version of VOC dataset because of limited resources. Seeing how difficult it is to tune these stuff I am really in awe of the work you guys do!

glenn-jocher commented 2 years ago

@ckyrkou oh, you can get started with VOC very easily. This command will train YOLOv5s on VOC to about 0.87 mAP@0.5 in 50 epochs. Dataset will be automatically downloaded if not found locally. https://wandb.ai/glenn-jocher/YOLOv5_VOC_official

train.py --batch 64 --weights yolov5s.pt --data VOC.yaml --epochs 50 --cache --img 512 --nosave --hyp hyp.VOC.yaml
ckyrkou commented 2 years ago

@ckyrkou oh, you can get started with VOC very easily. This command will train YOLOv5s on VOC to about 0.87 mAP@0.5 in 50 epochs. Dataset will be automatically downloaded if not found locally. https://wandb.ai/glenn-jocher/YOLOv5_VOC_official

train.py --batch 64 --weights yolov5s.pt --data VOC.yaml --epochs 50 --cache --img 512 --nosave --hyp hyp.VOC.yaml

Oh I am fully aware of this. I just like to implement things from scratch and also train models from scratch just to understand the various techniques better. Transfer learning feels like cheating! :)

glenn-jocher commented 2 years ago

@ckyrkou got it, understood. I'd say it's more not reinventing the wheel. Studying from-scratch trainings is much harder as the training time is much longer and requires a larger dataset to get best results, but this is what we do for COCO, i.e. all of the official YOLOv5 models are trained from scratch for 300 epochs.

This is nice and simple to explain and easier to reproduce for users that attempting several pretrained steps as many papers discuss.

VinchinYang commented 2 years ago

This is awesome! Your summary helps me a lot ! Which tool do you use when drawing these figures? @WZMIAOMIAO

WZMIAOMIAO commented 2 years ago

@VinchinYang I used drawio and powerpoint to draw it manually.

engrjav commented 2 years ago

@glenn-jocher @WZMIAOMIAO Thank you for your work. In the architecture summary it would be best if New CSP-Darknet53 and neck CSP-PAN are provided with some reference paper. Since there is no official publication on YOLOv5 , The information on current version ie 6. 1 is hard to acquire. I have consulted multiple research papers but the terminology are different. For instance it is written that yolov5 has neck (PANet +FPN) in many research papers but here you have officially written CSP-PAN. If possible, providing references would help students to better understand the architecture Thanks

glenn-jocher commented 2 years ago

@engrjav FPN and PANet are just two head architectures. Earlier versions of YOLOv5 used FPN and newer versions use PANet. CSP is a type of repeating module which as evolved into the current C3 modules.

Screen Shot 2022-07-29 at 3 05 16 PM
engrjav commented 2 years ago

@glenn-jocher thank you for the detailed answer. These are neck architectures. I am getting very good precision for my custom dataset on constituting 70% small objects (area less than 32x32 pixels) from yolov5 medium. The results are much better than scaled yolov-4 for same dataset, however, i want to find out the reason of such good detection on small objects from YOLOv5. As per my understanding, neck plays main role in preserving detailed feature of small objects. I believe CSP PANet is playing the part in YOLO v5 for good small object detection. Can you please comment/ advise if i am making the right link of small object detection in YOLOv5 with PANet?

glenn-jocher commented 2 years ago

@engrjav for small objects I'd recommend larger --imgsz during training and detection, and for very small objects, i.e. just a few pixels you could also try the YOLOv5l-P2 models which go down to stride 4 (or scale it down to m size if you want using the 2 compound scaling constants at the top of the model yaml): https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5-p2.yaml

engrjav commented 2 years ago

@glenn-jocher thank you . I will implement it.

Cong-Wan commented 2 years ago

@glenn-jocher hi, today I briefly summarized yolov5(v6.0). Please help to see if there are any problems or put forward better suggestions. Some schematic diagrams or contents will be added later. Thank you for your great work.

@WZMIAOMIAO @glenn-jocher Hi, thank for your nice work! There I have two questions, first, how could I print every layers outputs.(Here I'd like to change first layer kernel to small size that it's possible for small object detection.) Next, I also want to add a output for object tracing, ([x,y,w,h,nc] -> [x, y, w, h, nc, id]) but I don't know use which loss function to do it.