GPU Acceleration Doesn't Work with CUDA 12

yeldarby commented 7 months ago

Search before asking

[X] I have searched the Inference issues and found no similar bug report.

Bug

If you install inference-gpu on a machine with CUDA 12 it'll complain and fall back to CPU execution mode.

2024-03-10 21:00:56.403012156 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:640 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

There's a special version of onnxruntime needed for CUDA 12: https://onnxruntime.ai/docs/install/

Ideally we'd detect this automatically & make it "just work" with CUDA 12. But alternatively we could let the user know why they're not getting GPU acceleration.

Environment

root@C.10017133:/$ pip freeze | grep inference
inference-cli==0.9.15
inference-gpu==0.9.15
inference-sdk==0.9.15
root@C.10017133:/$

pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel docker with after running pip install inference-gpu

Minimal Reproducible Example

pip install inference-gpu
inference benchmark python-package-speed -m "yolov8n-640"

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

hvaria commented 6 months ago

To resolve the issue where inference-gpu defaults to CPU execution on systems with CUDA 12, due to compatibility with onnxruntime, consider extending the platform.py script to detect the CUDA version. This can be achieved by invoking system commands to extract the CUDA version information and including this data within the retrieve_platform_specifics function's return dictionary. Once CUDA version detection is implemented, use this information in benchmark_adapter.py before running benchmarks or initializing models that require GPU support. Check the detected CUDA version and, if CUDA 12 is identified, ensure the appropriate version of onnxruntime-gpu that supports CUDA 12 is used.

timhodgson12 commented 1 month ago

When are we getting Cuda 12 support?

roboflow / inference