Closed seekFire closed 4 years ago
Hello @seekFire, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@seekFire yes looks correct!
@seekFire That looks pretty and clean. What kind of drawing tool you use?
@ChristopherSTAN Just PowerPoint
@glenn-jocher Thank you for your confirmation!
Hello, I also made one, if there is any error, please help me point out : )
@bretagne-peiqi yes this looks correct, except that with the v2.0 release the 3 output Conv2d() boxes (red in your diagram) are now inside the Detect() stage:
(24): Detect(
(m): ModuleList(
(0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
)
@bretagne-peiqi ah, also you have an FPN head here, whereas the more recent YOLOv5 models have PANet heads. See https://github.com/ultralytics/yolov5/blob/master/models/yolov5s.yaml
@glenn-jocher many thanks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Good!
@seekFire @bretagne-peiqi @glenn-jocher do you guys have an overview diagram for this YOLOv4 v4.0?
Hi @pravastacaraka , here is a overview of YOLOv5 v4.0 , actually it looks very similar to the previous version, here is the v3.1 version.
I copied this diagram from here, it is written in Chinese.
Copyright statement: This article is the original article of the blogger and follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprinting. Link to this article: https://blog.csdn.net/Q1u1NG/article/details/107511465
@zhiqwang thank you so much for your kind help
well, I updated architecture
My apologies if this question is too beginner-level, but I would like to ask, what operation is it exactly that is used to "combine" the three predictions that we got from the detection layers?
@data4pass all detection heads concatenate together (along dimension 1) into a single output in the YOLOv5 Detect() layer: https://github.com/ultralytics/yolov5/blob/a820b43aca3816c9552e9beaf14a77955742b0ec/models/yolo.py#L73
Understood, but don't the three resulting tensors have different shapes? Don't we have to reshape the tensors somehow so that they can be concatenated?
@data4pass see Detect() layer for reshape ops: https://github.com/ultralytics/yolov5/blob/ba99092304a2ee715b6fb954b437b2d081203794/models/yolo.py#L36
well, I updated architecture
Hello @ehdrndd , what software did you use to make this picture?
The latest structure looks clean and simple
@yyccR very nice!
@yyccR , It's awesome! The structure of the latest YOLOv5 v6.0 is symmetrical (Especially for the PAN module), and this visualization demonstrates this thing, it also works for P6.
@glenn-jocher Concat layer for P4 should be between Conv and Conv not c3 as in the figure?
@glenn-jocher Concat layer for P4 should be between Conv and Conv not c3 as in the figure?
I also have the same question if you can answer please. @glenn-jocher
The latest structure looks clean and simple
Why in the outputs the third dimension is 85? @yyccR
@joangog 85=80(Number of categories in my dataset)+4(x,y,w,h)+1(objectness)
In order to understand the structure of YOLOv5 and use other frameworks to implement YOLOv5, I try to create an overview, as shown below. If there has any error, please point out
Hi, thanks for the great structure drawing! Can I use it in my undergraduate thesis as a theoretical introduction to YOLOv5, please? (I will imitate it and redraw one myself using PPT.
@Liqq1 I'm not the original author of the diagram but I would say yes!
@Liqq1 Absolutely yes! help yourself~^_^
@Liqq1 I'm not the original author of the diagram but I would say yes!
haha~thanks!😁
@Liqq1 Absolutely yes! help yourself~^_^
👏👏😻 thanks~!
This is my understanding of yolov5 s v6.1 model structure:
Hi @wwdok , It's great! And I think it would be better if the input image channel could be adjusted to RGB that YOLOv5 is currently using.
@zhiqiang Do you mean my illustrated image channel is BGR ?😄 I know it is RGB, it is really a little ambiguous and depends on the reading order.
Hi @wwdok , Got it!
@seekFire Thanks a lot for making a clear diagram of YOLOv5.
Regarding the original structure posted by @seekFire, could somebody clarify why there are 5 BottleneckCSP modules in the PANet part? In the documentation there are only 4. Based on the documentation, it seems to me that the BottleneckCSP module in the bottom-left of PANnet should be removed and the arrow of SPP should be connected to the Conv (1x1) module. Most likely I am wrong, but could somebody maybe clarify why?
@jeannot-github 👋 Hello! Thanks for asking about YOLOv5 🚀 architecture visualization. We've made visualizing YOLOv5 🚀 architectures easy, there are 3 main ways below. To answer your question though P5 heads (i.e. YOLOv5s) contain 4 C3 layers and P6 heads (i.e. YOLOv5s6) contain 6 C3 layers. You can compare the model yamls here:
https://github.com/ultralytics/yolov5/blob/master/models/yolov5s.yaml https://github.com/ultralytics/yolov5/blob/master/models/hub/yolov5s6.yaml
model.yaml
Each model has a corresponding yaml file that displays the model architecture. Here is YOLOv5s, defined by yolov5s.yaml
:
https://github.com/ultralytics/yolov5/blob/1a3ecb8b386115fd22129eaf0760157b161efac7/models/yolov5s.yaml#L12-L48
Simply start training a model, and then view the TensorBoard Graph for an interactive view of the model architecture. This example shows YOLOv5s viewed in our Notebook –
# Tensorboard
%load_ext tensorboard
%tensorboard --logdir runs/train
# Train YOLOv5s on COCO128 for 3 epochs
python train.py --weights yolov5s.pt --epochs 3
Use https://netron.app to view exported ONNX models:
python export.py --weights yolov5s.pt --include onnx --simplify
Good luck 🍀 and let us know if you have any other questions!
Hi @glenn-jocher , my question remains the same. in yolov5s.yaml, the backbone ends with an SPPF module, and the head starts with Conv module, which retrieves it's input from the previous module. Thus, I would expect that the SPPF output serves as input for the first Conv module in the head. In these graphs, I observe something differently. Also, I expect 4 C3 modules in the head based on yolov5s.yaml, but I observe 5 in these graphs. Could you explain where this difference comes from?
@jeannot-github yes SPPF feeds directly to head first Conv layer.
Graphs above are community contributions and may not be correct (or may be outdated). The 3 sources I mentioned above are the best sources of truth.
@glenn-jocher, in your graph I observe exactly the same behavior: SPPF is followed by a C3 module.
@jeannot-github sorry, TensorBoard screenshot is outdated (YOLOv5 has many releases, currently on v6.1). v6.1 TensorBoard is here. You can use commands above to reproduce and introspect yourself.
Great, thanks a lot for your help and clarification!
Hi seekFire!
I am planning to write a paper on the YOLOv5x6 performance in embedded devices, and I am wondering if I could make use of the explanatory diagram as well. Thanks in advance!
@glenn-jocher @zhiqwang @yyccR
Can you clarify how concat is working and what shape it will return during PANNet between C3(P4) and SPPF after upscaling concatenate is it adding the channels, or element-wise addition
How C3 will work after 3rd Concat when input and output channels are different
The latest structure looks clean and simple
Why in the outputs the third dimension is 85? @yyccR
Hi @joangog @glenn-jocher, thank you for your image however, I was hoping to get the understanding of some of the details in your image especial for the first layer and the output layer. Can you please assist me here? Thank you's in advance.
I tried researching this on google and the relative communities, I am sure that I missed the explanation some where. Please help to understand, I am unable to grasp these concepts.
I have an image of size 416 x 416 x 3 channels and I am trying to follow your details to draw my own diagram for my report. I am almost there, however, I am missing the understanding of the image dimensionality per the processing of every layer.
For example, in the first layer you have an (input of 3 frames that outputs 64 feature maps?) Can you elaborate a litter further here, please? I am not sure what the K = 6 and S = 2 means!!!?!?! As well as C3, is this a different type of convolution?
additionally, what does the 1/2, 1/4, 1/8, 1/16. 1/32 means??? Is that the image size as per your image and if yes, what would this be regarding my input size?
In the output I saw the relevance of the explanation of the 85 reflecting the number of class categories within the dataset, does the 3 reflect the RGB channels?
Can you guide me here with 1,2,3 & 4, please? Your assistance ensures that I have the opportunity to move forward..
thank you loads in advance!!
① K: kernel_size; S: stride ② 1/2 means that after passing through this layer, the output image size is 1/2 of the original image, and "1/4, 1/8, 1/16" is the same interpretation. "s=2" makes the image size 1/2 of the previous layer. You can search for "upsampling", "subsampled" to understand . ③ 85: There is class "80" + confidence "1" + border regression "4" in the coco dataset.@Symbadian
① K: kernel_size; S: stride ② 1/2 means that after passing through this layer, the output image size is 1/2 of the original image, and "1/4, 1/8, 1/16" is the same interpretation. "s=2" makes the image size 1/2 of the previous layer. You can search for "upsampling", "subsampled" to understand . ③ 85: There is class "80" + confidence "1" + border regression "4" in the coco dataset.@Symbadian
thank you for your swift reply, I will make the necessary adjustments. thanx once more
① K: kernel_size; S: stride ② 1/2 means that after passing through this layer, the output image size is 1/2 of the original image, and "1/4, 1/8, 1/16" is the same interpretation. "s=2" makes the image size 1/2 of the previous layer. You can search for "upsampling", "subsampled" to understand . ③ 85: There is class "80" + confidence "1" + border regression "4" in the coco dataset.@Symbadian
Hey @QiangYanHuang forgive me meticulousness, Can you guide me on what (n = block) refers to, please? is this the size of the output block?
thanx in advance for your input! really appreciate this!
In order to understand the structure of YOLOv5 and use other frameworks to implement YOLOv5, I try to create an overview, as shown below. If there has any error, please point out