ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.87k stars 16.38k forks source link

How do I decode raw output data of Yolov5? #11909

Closed joonryu69 closed 1 year ago

joonryu69 commented 1 year ago

Search before asking

Question

I am currently using a converted yolov5s model in the format of rknn for a single-board computer. Unlike when I used the pytorch model on a PC, the rknn model only outputs arrays that have to be decoded into useful information. I used the cv2 module to resize the input image into 640x640 size using the following code: image I checked the shape of the output results, and it looks like the following: image

I can tell that the "yolov5s.rknn" model itself works fine since it produces output images with bounding boxes in the demo runs found in the rknpu2 git repo. However, the demo is compiled and does not provide a python implementation. It would be helpful if someone with experience using yolov5 with rknn could guide me how to render the bounding boxes and get its coordinates. As I am not very knowledgeable in programming, please provide detailed instructions of what to do. Thank you.

Additional

No response

github-actions[bot] commented 1 year ago

👋 Hello @joonryu69, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@joonryu69 hi there! To decode the raw output data of YOLOv5, you can follow these steps:

  1. After running the inference using the rknn model, you will receive an array of output results.

  2. In the output results, each object detection consists of several values like class label, confidence score, and bounding box coordinates.

  3. To render the bounding boxes and retrieve their coordinates, you can use the cv2 module in Python. Here's some code to get you started:

    import cv2
    
    # Load the image
    image = cv2.imread("path_to_your_image.jpg")
    
    # Resize the image to 640x640 size
    resized_image = cv2.resize(image, (640, 640))
    
    # Render bounding boxes
    for result in output_results:
       class_label = result[0]  # Extract class label
       confidence = result[1]  # Extract confidence score
       box = result[2:]  # Extract bounding box coordinates [xmin, ymin, xmax, ymax]
    
       # Draw bounding box on the image
       cv2.rectangle(resized_image, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2)
    
       # Write class label and confidence score
       cv2.putText(resized_image, f"{class_label}: {confidence:.2f}", (box[0], box[1] - 10),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
    
    # Display the image with bounding boxes
    cv2.imshow("Output", resized_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    This code assumes that output_results is the array containing your detection results. Adjust the code as per your specific implementation.

I hope this helps! If you have any further questions, feel free to ask.

joonryu69 commented 1 year ago

@glenn-jocher First of all, thank you for your quick response. Although I do kind of understand how your code works, I think my output data is shaped a bit differently. My output structure seems to be in the shape (3x255x(80x80,40x40,20x20)). Thus giving this shape when I run this code: image I am not entirely sure what each axis represents so it is difficult to rearrange the code to make it work. Is this the normal shape of Yolov5 raw output, or is my model a bit different from the ones using a common format? Thanks!

joonryu69 commented 1 year ago

@glenn-jocher Adding some more screenshots to clarify my situation. image This gives the following error: image

glenn-jocher commented 1 year ago

Hi @joonryu69, thank you for providing additional screenshots to describe your situation.

Based on the shape of your output data (3x255x(80x80,40x40,20x20)), it appears that you are using YOLOv5 in an implementation that produces a different output format than the common YOLOv5 models. In the typical YOLOv5 model, the output shape is (N x 85), where N is the number of detected objects and each object detection consists of 85 values.

To understand the specific meaning of each axis in your output data, it would be helpful to refer to the documentation or implementation details of the YOLOv5 model you are using. The documentation or repository where you obtained the model should provide insights into the output format and how to interpret the values.

Regarding the error you are encountering, it appears that there is an index out of range in your code. This might be due to incorrect indexing or mismatched dimensions between your output data and the code you are using to process it. To resolve the error, please make sure to check the indexing and ensure that the dimensions of your output data align with the expected format.

If you have any further questions or need additional assistance, please feel free to ask.

MikeLud commented 1 year ago

I converted all of the YOLOv5 models to RKNN format for a object detection project I am working on. You can use CodeProject.AI Object Detection (YOLOv5 RKNN) module

Link to CodeProject.AI https://www.codeproject.com/Articles/5322557/CodeProject-AI-Server-AI-the-easy-way

Models https://github.com/MikeLud/CodeProject.AI-Custom-IPcam-Models/tree/main/RKNN_Models/yolov5

Below used YOLOv5l model image

image

glenn-jocher commented 1 year ago

@MikeLud thank you for sharing your YOLOv5 models converted to the RKNN format. It's great to see the CodeProject.AI Object Detection (YOLOv5 RKNN) module being used for object detection projects.

However, please note that here we provide support for YOLOv5, its usage, and related issues. We don't directly endorse or support third-party projects or modules.

If you have any specific questions or issues related to the usage or implementation of YOLOv5, please feel free to ask. We'll be glad to help you with any YOLOv5-related inquiries.

Thank you for your understanding.

github-actions[bot] commented 1 year ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐