ONNX format is running too slowly on both GPU and CPU

MennaTalhHossamAlden commented 6 months ago

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

No response

Bug

I installed torch version 2.2.1+cu121, onnx 1.16, and onnxruntime-gpu Then, I exported the model using this command and loaded it into a C++ code using OpenCV, but the inference is too slow. When I printed the time taken in inference it was the same for CPU and GPU.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

github-actions[bot] commented 6 months ago

👋 Hello @MennaTalhHossamAlden, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher commented 6 months ago

Hello! Thanks for reaching out and detailing the issue you're experiencing with YOLOv5 ONNX format performance. 🚀

From your description, it sounds like the model is not leveraging GPU acceleration as expected. A common cause could be an issue with the ONNX runtime setup or a mismatch in library versions. Here are a couple of steps to troubleshoot and potentially resolve the issue:

Verify ONNX Runtime: Ensure you're using the correct version of onnxruntime-gpu. There might be compatibility issues with certain versions. Consider updating or downgrading the onnxruntime-gpu package.
Dependencies Check: Double-check that all dependencies (CUDA, cuDNN) are correctly installed and accessible to onnxruntime. Sometimes, even with correct installation, environmental variables might need to be set properly.

Inference Code: Make sure your inference code is correctly configured to utilize the GPU. With ONNX Runtime, you need to explicitly set the session to run on GPU:

Ort::SessionOptions session_options;
session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
session_options.AppendExecutionProvider_CUDA(cuda_device_id);

Performance Profiling: Use tools like NVIDIA Nsight Systems or the PyTorch Profiler to understand where the bottleneck occurs. This might give you insights into whether the issue is related to data loading, model processing, or perhaps post-processing.
Alternative Export: Consider re-exporting the model with the latest version of PyTorch and ONNX, as there have been significant performance improvements in recent versions.

If these steps don’t pinpoint the issue, please provide more details about your setup and a minimal reproducible example, if possible. This information will be crucial for further investigation. 🛠️

Remember, the YOLO community and the Ultralytics team are here to support you!

MennaTalhHossamAlden commented 6 months ago

I used this example code to run the model : https://github.com/doleron/yolov5-opencv-cpp-python/blob/main/cpp/yolo.cpp The ONNX documentation mentions that ONNX version 1.17 is compatible with CUDA 12.2, but since ONNX 1.17 is not available on pip and I'm exporting the model on Colab, I installed version 1.16 instead. Additionally, this was written in the requirements.txt file: For inference code, I'm using OpenCV as shown in the above repo link. I've been stuck with this issue for 2 months now and have tried exporting the model as TorchScript, but it didn't work either.

MennaTalhHossamAlden commented 6 months ago

I would be very grateful if you could suggest which versions of each package to use if I have CUDA 12.1 on my device 🤗

glenn-jocher commented 6 months ago

Hello! 🌟 For CUDA 12.1, it's important to ensure compatibility across your libraries to get the best performance and support. While specific versions may vary based on ongoing updates and compatibility checks, here's a general recommendation to align with CUDA 12.1:

PyTorch: Check the PyTorch official website for the version compatible with CUDA 12.1. PyTorch typically provides a command for installation that's specific to your CUDA version.
TorchVision: Install a version that's compatible with your PyTorch version. The installation command on the PyTorch website usually includes TorchVision.
ONNX: Version 1.16 has been commonly used, but always check for any newer versions that might offer better compatibility or performance improvements.
ONNX Runtime: Opt for the GPU version that supports CUDA 12.1. Versioning information can typically be found in the ONNX Runtime GitHub repository's releases section.

Ensure all libraries are aligned in terms of compatibility. Sometimes, new versions are released to address specific issues or improve performance with certain CUDA versions. Keeping an eye on the release notes for these packages can provide valuable insights.

If you face any issues or have more specific needs, consider checking each library's official documentation or GitHub repofor the most accurate and up-to-date information. Happy coding! 🚀

MennaTalhHossamAlden commented 6 months ago

To anyone who is struggling with the same issue (as I was), no more worries, I got you 😉. You need to build OpenCV with CUDA support enabled, which indeed requires cuDNN to be installed and built. However, torchscript was a much easier solution.

glenn-jocher commented 6 months ago

Hey there! 😊 Thanks for sharing your solution with the community! Building OpenCV with CUDA support does indeed unlock a lot of performance benefits for GPU acceleration. And it's great to hear that you found success with TorchScript as well - it's a fantastic way to improve compatibility and make deployment more efficient. This kind of insight is super valuable to others experiencing similar issues. Keep up the awesome work and thanks again for contributing! 💪

ultralytics / yolov5