[Performance] cuDNN lib mismatch let to a underutilization of GPU

Describe the issue

When loading any mxnet (as mx in the picture) function, mx.nd.ones or mx.gpu(0) for example, I get a warning about cudnn mismatch after an error in cublas.

This cublas error is solved when instead of executing mx.gpu(0), i execute mx.gpu(). In both cases, GPU is detected.

Also, I execute onnxruntime-gpu, but it looks like mxnet throws the warning. However, when I do inference in GPU (using onnxruntime.InferenceSession with CUDAExecutionProvider) I noticed an underutilisation of the GPU (using no more than 40-50 % at some peaks in 100ms using nvidia-smi -lms 100, when most of the time is 0 to 6% ).

In https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements, I found that onnxruntime-gpu 1.12 is linked with cuda 11.4 and cudnn 8.2.4.

I wonder if this underutilization is due to the mismatching. In that case, is there any solution to eliminate the warning without installing another version of cudnn or cuda?

Thanks! Izan. error

gpu inference

To reproduce

I use Onnx Models from Onnx Model Zoo (https://github.com/onnx/models), particularly vgg16, resnet50, mobilenet, densenet, all of them quantize and no quantize (int8 and float32).

I use Mxnet framework to load the Imagenet dataset and calculate accuracies (mxnet-cu112 version 1.9.1) to allow inference with cuda 11.2.

As you can see on the pictures, CUDA and CPU providers are detected before and after creating InferencesSession, so it should be okay, but GPU performance is lower than expected.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 18.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-gpu 1.12.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

Cuda 11.2 with Cudnn 8.2.1

Model File

No response

Is this a quantized model?

Yes

microsoft / onnxruntime