microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

How to calling the ONNX Runtime model in QT (C++) with a GPU 4080 #18051

Closed jts250 closed 1 year ago

jts250 commented 1 year ago

Describe the issue

Hello, I am using a GPU 4080 with CUDA 11.8, cuDNN 8.5, and ONNX Runtime 1.15.1. However, when calling the ONNX Runtime model in QT (C++), the system always uses the CPU instead of the GPU. Previously, both a machine with GPU 3080, CUDA 11.3, cuDNN 8.2, ONNX Runtime 1.15.1 and another one with GPU 3070, CUDA 11.1, cuDNN 8.2, ONNX Runtime 1.15.1 could use CUDA for the task. Is there anything worthy of attention or is this issue caused by the version differences? Is the cuda version cause the problem? Do ONNX Runtime 1.15.1 support CUDA11.8 on a GPU 4080? I use QMAKE as I use QT. Thank you for help!

To reproduce

OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, filter_flags);

Urgency

Urgent

Platform

Windows

OS Version

windows11 x64

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

pranavsharma commented 1 year ago

Turn on verbose logs and grep for "Node placements" to see if the nodes in the graph were assigned to CPU or GPU.

tianleiwu commented 1 year ago

@jts250. Onnxruntime 1.15.1 does support 4080.

I tested with the following:

You can try download bert_toy_optimized.onnx, and try like the following:

python -m onnxruntime.transformers.profiler --model bert_toy_optimized.onnx -b 1 -s 4 --use_gpu --provider cuda

You shall see output like the following:

Grouped by provider + operator
----------------------------------------------------------------
Kernel(μs)      Provider%       Calls   AvgKernel(μs)   Provider        Operator
    447748          26.14       17000             26.3  CUDA            MatMul
    365594          21.34       11000             33.2  CUDA            SkipLayerNormalization
    278843          16.28        5000             55.8  CUDA            Attention
    187725          10.96        6000             31.3  CUDA            BiasGelu
    112941           6.59        4000             28.2  CUDA            Gather
     85418           4.99        2000             42.7  CUDA            Gemm
     51633           3.01        2000             25.8  CUDA            Add
     43509           2.54        1000             43.5  CUDA            Expand
     34866          65.43        2000             17.4  CPU             Cast
     30019           1.75        1000             30.0  CUDA            Cast
     29743           1.74        1000             29.7  CUDA            Shape
     28311           1.65        1000             28.3  CUDA            Slice
     27090           1.58        1000             27.1  CUDA            LayerNormalization
     24409           1.42        1000             24.4  CUDA            Tanh
     18424          34.57        1000             18.4  CPU             Min
jts250 commented 1 year ago

Thanks, and I have solved the problem! The version is right!