ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.39k stars 16.26k forks source link

ONNX format is running too slowly on both GPU and CPU #12901

Closed MennaTalhHossamAlden closed 6 months ago

MennaTalhHossamAlden commented 6 months ago

Search before asking

YOLOv5 Component

No response

Bug

I installed torch version 2.2.1+cu121, onnx 1.16, and onnxruntime-gpu Then, I exported the model using this command image and loaded it into a C++ code using OpenCV, but the inference is too slow. When I printed the time taken in inference it was the same for CPU and GPU.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

github-actions[bot] commented 6 months ago

πŸ‘‹ Hello @MennaTalhHossamAlden, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 πŸš€

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 πŸš€!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 6 months ago

Hello! Thanks for reaching out and detailing the issue you're experiencing with YOLOv5 ONNX format performance. πŸš€

From your description, it sounds like the model is not leveraging GPU acceleration as expected. A common cause could be an issue with the ONNX runtime setup or a mismatch in library versions. Here are a couple of steps to troubleshoot and potentially resolve the issue:

  1. Verify ONNX Runtime: Ensure you're using the correct version of onnxruntime-gpu. There might be compatibility issues with certain versions. Consider updating or downgrading the onnxruntime-gpu package.

  2. Dependencies Check: Double-check that all dependencies (CUDA, cuDNN) are correctly installed and accessible to onnxruntime. Sometimes, even with correct installation, environmental variables might need to be set properly.

  3. Inference Code: Make sure your inference code is correctly configured to utilize the GPU. With ONNX Runtime, you need to explicitly set the session to run on GPU:

    Ort::SessionOptions session_options;
    session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
    session_options.AppendExecutionProvider_CUDA(cuda_device_id);
  4. Performance Profiling: Use tools like NVIDIA Nsight Systems or the PyTorch Profiler to understand where the bottleneck occurs. This might give you insights into whether the issue is related to data loading, model processing, or perhaps post-processing.

  5. Alternative Export: Consider re-exporting the model with the latest version of PyTorch and ONNX, as there have been significant performance improvements in recent versions.

If these steps don’t pinpoint the issue, please provide more details about your setup and a minimal reproducible example, if possible. This information will be crucial for further investigation. πŸ› οΈ

Remember, the YOLO community and the Ultralytics team are here to support you!

MennaTalhHossamAlden commented 6 months ago

I used this example code to run the model : https://github.com/doleron/yolov5-opencv-cpp-python/blob/main/cpp/yolo.cpp The ONNX documentation mentions that ONNX version 1.17 is compatible with CUDA 12.2, but since ONNX 1.17 is not available on pip and I'm exporting the model on Colab, I installed version 1.16 instead. Additionally, this was written in the requirements.txt file: image For inference code, I'm using OpenCV as shown in the above repo link. I've been stuck with this issue for 2 months now and have tried exporting the model as TorchScript, but it didn't work either.

MennaTalhHossamAlden commented 6 months ago

I would be very grateful if you could suggest which versions of each package to use if I have CUDA 12.1 on my device πŸ€—

glenn-jocher commented 6 months ago

Hello! 🌟 For CUDA 12.1, it's important to ensure compatibility across your libraries to get the best performance and support. While specific versions may vary based on ongoing updates and compatibility checks, here's a general recommendation to align with CUDA 12.1:

Ensure all libraries are aligned in terms of compatibility. Sometimes, new versions are released to address specific issues or improve performance with certain CUDA versions. Keeping an eye on the release notes for these packages can provide valuable insights.

If you face any issues or have more specific needs, consider checking each library's official documentation or GitHub repofor the most accurate and up-to-date information. Happy coding! πŸš€

MennaTalhHossamAlden commented 6 months ago

To anyone who is struggling with the same issue (as I was), no more worries, I got you πŸ˜‰. You need to build OpenCV with CUDA support enabled, which indeed requires cuDNN to be installed and built. However, torchscript was a much easier solution.

glenn-jocher commented 6 months ago

Hey there! 😊 Thanks for sharing your solution with the community! Building OpenCV with CUDA support does indeed unlock a lot of performance benefits for GPU acceleration. And it's great to hear that you found success with TorchScript as well - it's a fantastic way to improve compatibility and make deployment more efficient. This kind of insight is super valuable to others experiencing similar issues. Keep up the awesome work and thanks again for contributing! πŸ’ͺ