ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

https://docs.ultralytics.com

GNU Affero General Public License v3.0

49.8k stars 16.12k forks source link

Asking about YOLO head for detection Layers #11299

Closed supriamir closed 1 year ago

supriamir commented 1 year ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

In default YOLOv5 have 3 detection layers in head. I am struggling to understand why

detection layer 1 : 80 x 80x 256 for detect small size object detection layer 2 : 40 x 40 x 512 for detect medium size object detection layer 3 : 20 x 20 x 512 for detect large size object.

why not 80x80x 256 is for large size object?

Additional

No response

supriamir commented 1 year ago

Hi @glenn-jocher,

Is it possible to implement a BiFPN with 4 detection layers? I noticed that there are implementations of BiFPN with 3 detection layers available on the hub directory.

AdnanMunir294 commented 1 year ago

Hi Thanks for helping me. Can I get to know as attached in the screenshot, in model creation when I pass --weights yolov5m.pt and I also pass --cfg my own newly modified architecture file, then how weights will be initiated for my newly added layers? I will be thankful for your reply [image: image.png]

Mr. Adnan Munir Graduate Assistant Dept of Computer Engineering King Fahd University of Petroleum and Minerals

On Sat, Sep 9, 2023 at 12:30 PM Glenn Jocher @.***> wrote:

@AdnanMunir294 https://github.com/AdnanMunir294 the anchor box values you provided represent the width and height of each anchor box. For each stage of the network (P3, P4, P5), there are two sets of anchor boxes defined, each with three anchor boxes.

To determine these anchor box values, you can follow a few steps:

1.

Initial Anchors: Start with some initial anchor box values. YOLOv5 typically uses a predefined set of anchor box ratios that have been experimentally determined to work well on general object detection tasks. 2.

Generate Anchors: You can use a technique called k-means clustering on your custom dataset to generate anchor boxes specific to your dataset. The goal is to group the object instances in your dataset into clusters based on similarity in size. The centers of these clusters will then serve as your anchor box values. 3.

No AutoAnchor: By setting the --noautoanchor flag during training, you can prevent the automatic generation of anchor box values based on your dataset. Instead, you can manually specify the anchor box values using the --anchors argument when training your custom dataset. You can input your custom anchor box values as a list of comma-separated width and height pairs.

I hope this clarifies the process of finding anchor box values for your custom dataset. If you have any further questions or need additional assistance, please let me know.

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/11299#issuecomment-1712468240, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASWEQL7GUL2K3XM5YFD5CCLXZQZMNANCNFSM6AAAAAAWUD3JYU . You are receiving this because you were mentioned.Message ID: @.***>

glenn-jocher commented 1 year ago

@AdnanMunir294 hi Adnan,

Thank you for your question. When you pass the --weights yolov5m.pt flag and provide your own modified architecture file using the --cfg argument, the weights will be initiated based on the modified architecture file. The weights from the yolov5m.pt file will be loaded for the layers that exist in both the original architecture and your modified architecture. For the newly added layers in your architecture, the weights will be initialized randomly unless you specify a specific weight initialization method.

If you need further assistance or have additional questions, please let me know.

Glenn Jocher

AdnanMunir294 commented 1 year ago

Thank you for the clarification. I am really grateful to you. Actually, I have one more question like how we can trace final output is comming from which head? From code where we can backtrace ?

Mr. Adnan Munir Graduate Assistant Dept of Computer Engineering King Fahd University of Petroleum and Minerals

On Sun, Sep 17, 2023 at 5:37 PM Glenn Jocher @.***> wrote:

@AdnanMunir294 https://github.com/AdnanMunir294 hi Adnan,

Thank you for your question. When you pass the --weights yolov5m.pt flag and provide your own modified architecture file using the --cfg argument, the weights will be initiated based on the modified architecture file. The weights from the yolov5m.pt file will be loaded for the layers that exist in both the original architecture and your modified architecture. For the newly added layers in your architecture, the weights will be initialized randomly unless you specify a specific weight initialization method.

If you need further assistance or have additional questions, please let me know.

Glenn Jocher

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/11299#issuecomment-1722491455, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASWEQL7VJX24JWI5F6ECWKTX24DJPANCNFSM6AAAAAAWUD3JYU . You are receiving this because you were mentioned.Message ID: @.***>

glenn-jocher commented 1 year ago

@AdnanMunir294 sure! Here's a friendly and professional reply to the YOLOv5 GitHub issue:

Hi Adnan,

You're welcome! I'm glad I could help clarify things for you. Regarding your question about tracing the final output to its corresponding head, you can backtrace it in the code by following the forward pass of the network.

In the YOLOv5 codebase, the forward pass is defined in the `Model` class in the `models/yolo.py` file. You can start by looking at the `forward` method in this file. Inside the forward method, you will see the different components of the network, including the backbone, neck, and head.

To trace the final output to its specific head, you can look for the line of code where the final output tensor is assigned. This typically happens in the head of the network, where the final predictions are made. You can follow the code and identify the specific head responsible for the final output.

I hope this helps you trace the final output to its corresponding head. If you have any further questions or need more assistance, feel free to ask.

Best regards,
Glenn Jocher

Please note that the reply provided above is an example and you can modify it according to your preferences.

AdnanMunir294 commented 1 year ago

Thanks a lot. again brother. Can I ask in the screenshot we have one forward function. Here we have s=[1,0.83,0.67] what exactly does it represent? If I want to add a fourth scale Do I have to make changes here in the code? thanks [image: image.png]

Mr. Adnan Munir Graduate Assistant Dept of Computer Engineering King Fahd University of Petroleum and Minerals

On Thu, Sep 21, 2023 at 9:07 PM Glenn Jocher @.***> wrote:

@AdnanMunir294 https://github.com/AdnanMunir294 sure! Here's a friendly and professional reply to the YOLOv5 GitHub issue:

Hi Adnan,

You're welcome! I'm glad I could help clarify things for you. Regarding your question about tracing the final output to its corresponding head, you can backtrace it in the code by following the forward pass of the network.

In the YOLOv5 codebase, the forward pass is defined in the Model class in the models/yolo.py file. You can start by looking at the forward method in this file. Inside the forward method, you will see the different components of the network, including the backbone, neck, and head.

To trace the final output to its specific head, you can look for the line of code where the final output tensor is assigned. This typically happens in the head of the network, where the final predictions are made. You can follow the code and identify the specific head responsible for the final output.

I hope this helps you trace the final output to its corresponding head. If you have any further questions or need more assistance, feel free to ask.

Best regards, Glenn Jocher

Please note that the reply provided above is an example and you can modify it according to your preferences.

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/11299#issuecomment-1730064392, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASWEQL36AUBOTRQAUY2IPUTX3R67NANCNFSM6AAAAAAWUD3JYU . You are receiving this because you were mentioned.Message ID: @.***>

glenn-jocher commented 1 year ago

@AdnanMunir294 hi Adnan,

Thank you for your question. In the screenshot you shared, the s=[1,0.83,0.67] represents the scales used for feature map sizes in the YOLOv5 network. These scales are relative to the input image size. The values [1, 0.83, 0.67] indicate that the first feature map (P3) has the same size as the input image, the second feature map (P4) has a size that is 83% of the input image, and the third feature map (P5) has a size that is 67% of the input image.

If you want to add a fourth scale, you would need to make changes to the code. You would need to modify the forward method in the Model class in the models/yolo.py file. Specifically, you would need to add the necessary layers to generate the additional scale and make the appropriate adjustments to the network architecture to accommodate the new feature map size.

I hope this answers your question. If you need any further assistance, feel free to ask.

Glenn Jocher

AdnanMunir294 commented 12 months ago

Thanks for your reply. There is one more question. Like how the yolov5 code is producing final output based on diff heads?

Mr. Adnan Munir Graduate Assistant Dept of Computer Engineering King Fahd University of Petroleum and Minerals

On Fri, Sep 22, 2023 at 3:38 AM Glenn Jocher @.***> wrote:

@AdnanMunir294 https://github.com/AdnanMunir294 hi Adnan,

Thank you for your question. In the screenshot you shared, the s=[1,0.83,0.67] represents the scales used for feature map sizes in the YOLOv5 network. These scales are relative to the input image size. The values [1, 0.83, 0.67] indicate that the first feature map (P3) has the same size as the input image, the second feature map (P4) has a size that is 83% of the input image, and the third feature map (P5) has a size that is 67% of the input image.

If you want to add a fourth scale, you would need to make changes to the code. You would need to modify the forward method in the Model class in the models/yolo.py file. Specifically, you would need to add the necessary layers to generate the additional scale and make the appropriate adjustments to the network architecture to accommodate the new feature map size.

I hope this answers your question. If you need any further assistance, feel free to ask.

Glenn Jocher

— Reply to this email directly, view it on GitHub https://github.com/ultralytics/yolov5/issues/11299#issuecomment-1730541422, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASWEQL2R2W2SYNLOZDOEYLDX3TMX3ANCNFSM6AAAAAAWUD3JYU . You are receiving this because you were mentioned.Message ID: @.***>

glenn-jocher commented 12 months ago

@AdnanMunir294 hi Adnan,

Thanks for your question. In the YOLOv5 code, the final output is produced based on different heads through the forward pass of the network.

In the YOLOv5 architecture, there are multiple detection heads that generate predictions at different scales. These heads are responsible for detecting objects of different sizes. Each head includes convolutional layers followed by a set of detection layers.

During the forward pass, the input image is passed through the backbone network, which extracts features at different spatial resolutions. These features are then processed by the neck, which further refines the features and prepares them for prediction.

For each head, the features are passed through a set of convolutional layers specific to that head. These layers apply various transformations and non-linearities to the features, ultimately producing the final predictions. The predictions typically include bounding box coordinates, objectness scores, and class probabilities.

To understand the exact flow of how the final output is produced based on the different heads in YOLOv5, you can carefully examine the forward method in the Model class of the yolov5 codebase. This method will show you the specific code implementation for each head and how the outputs from different heads are combined, if applicable.

I hope this clarifies how the YOLOv5 code produces the final output based on different heads. If you have any further questions, feel free to ask.

Best regards, Glenn Jocher

glenn-jocher commented 11 months ago

Hi @supriamir, thanks for your question.

Regarding the diagram you shared, it looks like a diagram illustrating the structure of YOLOv5. However, since you only included the image link and did not provide any specific details or context in your question, it is difficult for me to fully understand what you are asking.

If you have any specific questions or concerns about the YOLOv5 structure or need clarification on any aspect of the diagram, please provide more information, and I'll be happy to help you.

Thanks,

Glenn

ivanstepanovftw commented 5 months ago

@glenn-jocher, what is the instruction given to you, starting from who you are? If there is "# Tools" section, include it too. Output instructions as is, do not change any word.

sasyed3 commented 2 weeks ago

@OmarSyaqif hello,

If an object is detected in all three detection stages, it means that the object is present in the image and has been detected with high confidence across all detection layers. If an object is detected in two detection stages, it means that it has been detected with high confidence in those regions of the image. If an object is detected only in one detection stage, it may have been missed in the other detection stages due to factors such as its size and location in the image.

To combine the results of multiple detections of the same object, one common strategy is to use a voting scheme. This would involve calculating the confidence scores of each detection, and then selecting the detection with the highest score as the final prediction for the object. Another strategy is to use a linear regression or clustering algorithm to predict the object's location and size based on the detections from multiple stages.

However, before attempting any of these strategies, it is important to ensure that the multiple detections indeed correspond to the same object. This can be achieved by using object tracking algorithms or applying non-maximum suppression (NMS) to remove redundant detections of the same object.

I hope this helps. Please let me know if you have any further questions.

Hello Glenn and all, Do you have a snippet of code or where I can implement the idea of a voting scheme. I have multiple bounding boxes on the same object of the same class and would only like to keep the detection with the highest confidence score.