Open Zhangts98 opened 3 months ago
Intra op num threads is for CPU. For CUDA EP, usually most time is spent on CUDA kernel, which are not impacted by CPU thread setting.
Intra op num threads is for CPU. For CUDA EP, usually most time is spent on CUDA kernel, which are not impacted by CPU thread setting.
thanks
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
When using SetIntraOpNumThreads (1) and SetIntraOpNumThreads (10) on GPU, their inference time is similar, both around 30ms。I have already done warm-up before calculating the time consumption。 How to set it up to improve inference speed? Do I need to build ORT with OpenMP from the source code?
CUDA version can not be modified。
To reproduce
Urgency
No response
Platform
Linux
OS Version
centos 7
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.9.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.2 cudnn 8.1.0
Model File
No response
Is this a quantized model?
No