microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.87k stars 2.94k forks source link

[Performance] cuDNN lib mismatch let to a underutilization of GPU #14498

Open IzanCatalan opened 1 year ago

IzanCatalan commented 1 year ago

Describe the issue

When loading any mxnet (as mx in the picture) function, mx.nd.ones or mx.gpu(0) for example, I get a warning about cudnn mismatch after an error in cublas.

This cublas error is solved when instead of executing mx.gpu(0), i execute mx.gpu(). In both cases, GPU is detected.

Also, I execute onnxruntime-gpu, but it looks like mxnet throws the warning. However, when I do inference in GPU (using onnxruntime.InferenceSession with CUDAExecutionProvider) I noticed an underutilisation of the GPU (using no more than 40-50 % at some peaks in 100ms using nvidia-smi -lms 100, when most of the time is 0 to 6% ).

In https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements, I found that onnxruntime-gpu 1.12 is linked with cuda 11.4 and cudnn 8.2.4.

I wonder if this underutilization is due to the mismatching. In that case, is there any solution to eliminate the warning without installing another version of cudnn or cuda?

Thanks! Izan. error

gpu inference

To reproduce

I use Onnx Models from Onnx Model Zoo (https://github.com/onnx/models), particularly vgg16, resnet50, mobilenet, densenet, all of them quantize and no quantize (int8 and float32).

I use Mxnet framework to load the Imagenet dataset and calculate accuracies (mxnet-cu112 version 1.9.1) to allow inference with cuda 11.2.

As you can see on the pictures, CUDA and CPU providers are detected before and after creating InferencesSession, so it should be okay, but GPU performance is lower than expected.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 18.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-gpu 1.12.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

Cuda 11.2 with Cudnn 8.2.1

Model File

No response

Is this a quantized model?

Yes

RyanUnderhill commented 1 year ago

set MXNET_CUDNN_LIB_CHECKING=0 doesn't completely solve it? (the suggestion in your screenshot). Note that these errors are not from onnxruntime but from the cuda libraries.