Open WZMIAOMIAO opened 2 years ago
@engrjav FPN and PANet are just two head architectures. Earlier versions of YOLOv5 used FPN and newer versions use PANet. CSP is a type of repeating module which as evolved into the current C3 modules.
Hi @glenn-jocher Why did you choose PANet? Is there a comparison chart? Do you think to prefer Light-BiFPN module for small models? Light-Yolov5: https://arxiv.org/pdf/2208.13422.pdf
@kadirnar BiFPN and PANet are nearly identical, in a P3-P5 output model the only difference is a single shortcut. There are versions of all 3 heads available here: https://github.com/ultralytics/yolov5/tree/master/models/hub
As always all design decisions are based on empirical results.
Hello,can we get the results of the ablation experiment?Such as SPP2SPPF、Focus2Conv mAP results on big datasets
@divided-by-7 I'm sorry, we don't this R&D saved in a presentable manner.
@WZMIAOMIAO Could you please summarize the YOLOv5 Instance Segmentation Model Structure? especially the keywords definition of output0 float32[1,25200,117] and output1 float32[1,32,160,160]. Thank you very much in advance!
Dear @glenn-jocher @WZMIAOMIAO The segmentation part is excellent. What has changed in the model architecture related to this, could you provide an example model architecture, thanks in advance.
Hi! What do k, s, p, and c represent in the model structure, respectively?
Hi! What do k, s, p, and c represent in the model structure, respectively?
This is a simple question. k = kernel size, s = stride, p = padding, c = channel dims
Hi! What do k, s, p, and c represent in the model structure, respectively?
This is a simple question. k = kernel size, s = stride, p = padding, c = channel dims
Okay, thank you very much!
Hello @glenn-jocher or anyone who knows the answer. I am trying to understand the build targets process a little more. When you say GTx%1>0.5 and GTy%1>0.5 is the % just the modulus? If it is the modulo operator, then why is this used?
Thanks,
Karl Gardner
@WZMIAOMIAO @glenn-jocher or anyone who knows. I am trying to understand more about the model structure. Is there an article that discusses and explains the YOLOv5 structure? Thanks!
Hi @glenn-jocher can i know what is the formula if input image 640x640x3 becomes 320x320x64 with k=3 s=2 p=1?
@gracesmrngkr this transformation is governed by the following formula:
[ \text{output_size} = \left\lfloor \frac{\text{input_size} - \text{kernel_size} + 2\times \text{padding}}{\text{stride}} \right\rfloor + 1 ]
So in this case, with an input size of 640 and a kernel size of 3, a stride of 2, and padding of 1, the output size would be 320.
Content
1. Model Structure
YOLOv5 (v6.0/6.1) consists of:
New CSP-Darknet53
SPPF
,New CSP-PAN
YOLOv3 Head
Model structure (
yolov5l.yaml
):Some minor changes compared to previous versions:
Focus
structure with6x6 Conv2d
(more efficient, refer #4825)SPP
structure withSPPF
(more than double the speed)test code
```python import time import torch import torch.nn as nn class SPP(nn.Module): def __init__(self): super().__init__() self.maxpool1 = nn.MaxPool2d(5, 1, padding=2) self.maxpool2 = nn.MaxPool2d(9, 1, padding=4) self.maxpool3 = nn.MaxPool2d(13, 1, padding=6) def forward(self, x): o1 = self.maxpool1(x) o2 = self.maxpool2(x) o3 = self.maxpool3(x) return torch.cat([x, o1, o2, o3], dim=1) class SPPF(nn.Module): def __init__(self): super().__init__() self.maxpool = nn.MaxPool2d(5, 1, padding=2) def forward(self, x): o1 = self.maxpool(x) o2 = self.maxpool(o1) o3 = self.maxpool(o2) return torch.cat([x, o1, o2, o3], dim=1) def main(): input_tensor = torch.rand(8, 32, 16, 16) spp = SPP() sppf = SPPF() output1 = spp(input_tensor) output2 = sppf(input_tensor) print(torch.equal(output1, output2)) t_start = time.time() for _ in range(100): spp(input_tensor) print(f"spp time: {time.time() - t_start}") t_start = time.time() for _ in range(100): sppf(input_tensor) print(f"sppf time: {time.time() - t_start}") if __name__ == '__main__': main() ``` result: ``` True spp time: 0.5373051166534424 sppf time: 0.20780706405639648 ```2. Data Augmentation
Mosaic
Copy paste
Random affine(Rotation, Scale, Translation and Shear)
MixUp
Albumentations
Augment HSV(Hue, Saturation, Value)
Random horizontal flip
3. Training Strategies
4. Others
4.1 Compute Losses
The YOLOv5 loss consists of three parts:
4.2 Balance Losses
The objectness losses of the three prediction layers(
P3
,P4
,P5
) are weighted differently. The balance weights are[4.0, 1.0, 0.4]
respectively.4.3 Eliminate Grid Sensitivity
In YOLOv2 and YOLOv3, the formula for calculating the predicted target information is:
In YOLOv5, the formula is:
+c_x)
+c_y)
^2)
^2)
Compare the center point offset before and after scaling. The center point offset range is adjusted from (0, 1) to (-0.5, 1.5). Therefore, offset can easily get 0 or 1.
Compare the height and width scaling ratio(relative to anchor) before and after adjustment. The original yolo/darknet box equations have a serious flaw. Width and Height are completely unbounded as they are simply out=exp(in), which is dangerous, as it can lead to runaway gradients, instabilities, NaN losses and ultimately a complete loss of training. refer this issue
4.4 Build Targets
Match positive samples:
Assign the successfully matched Anchor Templates to the corresponding cells
Because the center point offset range is adjusted from (0, 1) to (-0.5, 1.5). GT Box can be assigned to more anchors.
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.