microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.85k stars 2.94k forks source link

[Performance] SetIntraOpNumThreads not take effect #21700

Open Zhangts98 opened 3 months ago

Zhangts98 commented 3 months ago

Describe the issue

When using SetIntraOpNumThreads (1) and SetIntraOpNumThreads (10) on GPU, their inference time is similar, both around 30ms。I have already done warm-up before calculating the time consumption。 How to set it up to improve inference speed? Do I need to build ORT with OpenMP from the source code?

CUDA version can not be modified。

To reproduce

...

env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "ONNXRuntime inference");

//onnx session option
session_options.SetIntraOpNumThreads(1);
//session_options.SetIntraOpNumThreads(10);
OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
sess =  Ort::Session(env, model_buffer, model_buffer_len, session_options);

// warm up
sess.run(xxxxxx)

...

Urgency

No response

Platform

Linux

OS Version

centos 7

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.9.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.2 cudnn 8.1.0

Model File

No response

Is this a quantized model?

No

tianleiwu commented 3 months ago

Intra op num threads is for CPU. For CUDA EP, usually most time is spent on CUDA kernel, which are not impacted by CPU thread setting.

Zhangts98 commented 3 months ago

Intra op num threads is for CPU. For CUDA EP, usually most time is spent on CUDA kernel, which are not impacted by CPU thread setting.

thanks

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.