ultralytics / ultralytics

NEW - YOLOv8 πŸš€ in PyTorch > ONNX > OpenVINO > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
24.46k stars 4.86k forks source link

Yolov8 detection layers #12709

Open tjasmin111 opened 2 weeks ago

tjasmin111 commented 2 weeks ago

Search before asking

Question

I am confused about the detection layers of Yolov8 and how it works: The following is the architecture of yolov8 that shows there are 3 detect layers in the head. But when I visualize my yolov8n detection model, it shows as a single layer with output size of 1x6x10710.

image

What are these 3 detect layers? And hows does this 1x6x10710 output size relate to these 3 layers?

image

Additional

No response

glenn-jocher commented 2 weeks ago

Hello,

Yolov8 utilizes a multi-scale prediction technique where it uses 3 detection layers at different scales to increase the robustness of the model in detecting various object sizes.

The 1x6x10710 output format you're seeing represents the flattened results of these three layers. Each detection layer predicts bounding boxes at its scale, and these are concatenated to form the final output. Here, 6 represents the parameters for each prediction (4 for bounding box coordinates, 1 for confidence score, and 1 for class prediction), and 10710 is the total number of predictions across all layers.

This structure allows YOLOv8 to effectively detect objects at different scales with a single forward pass.

If you need more clarity or further examples, please let us know! 😊

tjasmin111 commented 2 weeks ago

Got it, thanks. Ok, now my question is given I have this raw 1x6x10710 prediction array, how can I decode the final bbox and classes and confidence? That's exactly what I am looking for. The raw prediction array is read from file and let's assume we have it.

array = np.fromfile(opt.yolo_data_path)

glenn-jocher commented 2 weeks ago

To decode the 1x6x10710 prediction array into bounding boxes, classes, and confidence scores, you can reshape the array and then process it. First, reshape the array to separate out each component:

import numpy as np

# Assuming array has been loaded as described
array = np.fromfile(opt.yolo_data_path, dtype=np.float32).reshape(-1, 6)

# Split the data
boxes = array[:, :4]  # Bounding box coordinates
confidences = array[:, 4]  # Confidence scores
class_scores = array[:, 5]  # Class scores

This code snippet assumes each row in your final array represents a prediction, with the first four entries of each row denoting the bounding box coordinates, the fifth entry denoting the confidence score, and the sixth entry denoting the class score. Adjust the dtype in np.fromfile based on the data format saved. If you have any more questions or need further assistance, feel free to ask! 😊

tjasmin111 commented 2 weeks ago

Got it. Then how to apply NMS to them? We need the final results.

glenn-jocher commented 2 weeks ago

Hello!

To apply Non-Maximum Suppression (NMS) to filter out overlapping bounding boxes and keep only the best ones, you can use a utility from libraries like OpenCV. Here’s a quick example using Python:

import numpy as np
import cv2

# Assume 'boxes', 'confidences', and 'class_scores' are already defined
indices = cv2.dnn.NMSBoxes(boxes.tolist(), confidences.tolist(), score_threshold=0.5, nms_threshold=0.4)

final_boxes = boxes[indices].reshape(-1, 4)
final_confidences = confidences[indices]
final_class_scores = class_scores[indices]

This example uses cv2.dnn.NMSBoxes, where you set a score_threshold to filter detection results based on confidence and nms_threshold to specify the overlap threshold for suppressing weaker overlapping boxes.

I hope this helps! Let me know if you have further questions. 😊