Closed jts250 closed 1 year ago
Turn on verbose logs and grep for "Node placements" to see if the nodes in the graph were assigned to CPU or GPU.
@jts250. Onnxruntime 1.15.1 does support 4080.
I tested with the following:
You can try download bert_toy_optimized.onnx, and try like the following:
python -m onnxruntime.transformers.profiler --model bert_toy_optimized.onnx -b 1 -s 4 --use_gpu --provider cuda
You shall see output like the following:
Grouped by provider + operator
----------------------------------------------------------------
Kernel(μs) Provider% Calls AvgKernel(μs) Provider Operator
447748 26.14 17000 26.3 CUDA MatMul
365594 21.34 11000 33.2 CUDA SkipLayerNormalization
278843 16.28 5000 55.8 CUDA Attention
187725 10.96 6000 31.3 CUDA BiasGelu
112941 6.59 4000 28.2 CUDA Gather
85418 4.99 2000 42.7 CUDA Gemm
51633 3.01 2000 25.8 CUDA Add
43509 2.54 1000 43.5 CUDA Expand
34866 65.43 2000 17.4 CPU Cast
30019 1.75 1000 30.0 CUDA Cast
29743 1.74 1000 29.7 CUDA Shape
28311 1.65 1000 28.3 CUDA Slice
27090 1.58 1000 27.1 CUDA LayerNormalization
24409 1.42 1000 24.4 CUDA Tanh
18424 34.57 1000 18.4 CPU Min
Thanks, and I have solved the problem! The version is right!
Describe the issue
Hello, I am using a GPU 4080 with CUDA 11.8, cuDNN 8.5, and ONNX Runtime 1.15.1. However, when calling the ONNX Runtime model in QT (C++), the system always uses the CPU instead of the GPU. Previously, both a machine with GPU 3080, CUDA 11.3, cuDNN 8.2, ONNX Runtime 1.15.1 and another one with GPU 3070, CUDA 11.1, cuDNN 8.2, ONNX Runtime 1.15.1 could use CUDA for the task. Is there anything worthy of attention or is this issue caused by the version differences? Is the cuda version cause the problem? Do ONNX Runtime 1.15.1 support CUDA11.8 on a GPU 4080? I use QMAKE as I use QT. Thank you for help!
To reproduce
OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, filter_flags);
Urgency
Urgent
Platform
Windows
OS Version
windows11 x64
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8