ivilson / Yolov7net

Yolo Detector for .Net 8
83 stars 25 forks source link

CPU vs GPU #23

Closed jaydubal closed 2 months ago

jaydubal commented 1 year ago

Hello, when using useCuda = true the Yolo7 does use the GPU. However there is no performance gain at all. The result is same as when using CPU.

What could be the reason?

I am using Microsoft.ML.OnnxRuntime.Gpu version 1.6.0 which works with Nvidia Cuda 10.2 and my card GT 710 2GB.

useCuda = false (CPU 40%, GPU 0%) useCuda = true (CPU 10%, GPU 50%)

Thanks!

iwaitu commented 3 months ago

Thank you for bringing up this issue. The fact that there's no significant performance gain when using Yolo7 with useCuda = true on your GT 710 GPU, despite it being utilized (50% GPU usage), is intriguing. Here are a few potential reasons and suggestions:

GPU Compute Capability: The GT 710 is an older generation GPU with limited compute capability and lower CUDA cores compared to more recent models. Yolo7 and other deep learning models benefit significantly from GPUs with high compute capability and more CUDA cores. The limited computational power of the GT 710 might not be sufficient to realize significant performance improvements.

GPU Memory Bandwidth: The GT 710 also has relatively low memory bandwidth, which can be a bottleneck for deep learning models that require high throughput for data transfer between the GPU memory and the processor.

Model Complexity: Yolo7 is a complex model, and its performance gain on the GPU is highly dependent on the model's ability to parallelize its operations across many CUDA cores. The limited number of CUDA cores on the GT 710 might not be enough to effectively distribute the model's computations.

Microsoft.ML.OnnxRuntime.Gpu Version: You mentioned using version 1.6.0, which is compatible with CUDA 10.2. It's possible that this version of the ONNX Runtime or its CUDA integration may not be fully optimized for the best performance with Yolo7. Checking for any available updates or considering a different runtime version might help.

Overhead Costs: When running models on the GPU, there are overhead costs associated with transferring data between the CPU and GPU. For smaller models or less intensive computations, this overhead can sometimes negate the performance benefits of using the GPU.

Benchmarking Methodology: Ensure that your performance comparison between CPU and GPU is done using appropriate benchmarks. It's essential to consider not just the utilization percentages but also the absolute inference time for a fixed number of images or frames.

For a more definitive assessment, you might consider testing with a more recent GPU with higher compute capability and comparing the results. This could help determine if the issue is primarily due to the hardware limitations of the GT 710.

I hope these suggestions help in diagnosing the issue. It would be interesting to see the performance comparison on a more recent GPU if you have access to one.

Best regards.