Open zhanggd001 opened 1 year ago
What time period is between the inference calls?
Inference times may fluctuate due to GPU and CPU power control/energy savings.
Try to set your GPU to the PowerMizer mode "Maximum Performance" and turn the fan manually to a high speed and try again. Does the situation improve? What inference times are you targeting?
What time period is between the inference calls?
Inference times may fluctuate due to GPU and CPU power control/energy savings.
Try to set your GPU to the PowerMizer mode "Maximum Performance" and turn the fan manually to a high speed and try again. Does the situation improve? What inference times are you targeting?
Thanks. I find that this is caused by the verison of GPU driver (My GPU is GTX950m. Win10). When using the 47X.XX GPU driver, the inference time is stable. But the inference time is unstable and varies enormously with higher versions than 47X.XX.
Hi, thanks! That's interesting, because I have the same Problem and have never tried old drivers.
Forum post: https://forums.developer.nvidia.com/t/strange-cnn-inference-latency-behavior-with-cuda-and-tensorrt/237501
Describe the issue
When testing the inference time with CUDA EP (Win10 x64, VS2017, CUDA 11.7, onnxruntime=1.12.1), I find that the inference time is unstable and varies enormously. For example, when testing 1000 times, the inference time will be larger suddenly.
To reproduce
Ort::Env env(ORT_LOGGING_LEVEL_WARNING);
Ort::SessionOptions session_options; session_options.SetIntraOpNumThreads(4); session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
// use gpu OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
Ort::Session* ort_session = new Ort::Session(env, model_path, session_options);
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
std::vector input_tensor_values(input_tensor_size);
Ort::Value input_tensor = Ort::Value::CreateTensor(memory_info, input_tensor_values.data(), input_tensor_size, input_node_dims.data(), 4);
auto start_inference = std::chrono::steady_clock::now();
auto output_tensors = ort_session->Run( Ort::RunOptions{ nullptr }, input_node_names.data(), &input_tensor, num_input_nodes, output_node_names.data(), num_output_nodes);
auto end_inference = std::chrono::steady_clock::now(); std::chrono::duration time_inference = end_inference - start_inference;
std::cout << "Inference Time : " << time_inference.count() * 1000 << "ms" << std::endl;
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.12.1-gpu
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.7
Model File
No response
Is this a quantized model?
No