In versions 1.17.0 and earlier of onnxruntime-directml, when using an AMD GPU and the onnxruntime.InferenceSession() method to load an ONNX model onto the GPU, a model session is created. If the program utilizes multithreading, multiple threads may compete for the model session, leading to deadlocks and crashes. Implementing a queue mechanism to avoid resource contention resolves the issue in these versions.
However, from version 1.18.0 onwards, despite using various mechanisms such as queueing, locks, and thread semaphores to limit resource contention in a multithreaded environment, these solutions have no effect. The problem persists, resulting in deadlocks and crashes.
Steps to Reproduce:
Use an AMD GPU.
Load an ONNX model using onnxruntime.InferenceSession() in a multithreaded program.
Observe deadlocks and crashes due to multiple threads competing for the model session.
Implement queueing, locks, and thread semaphores to manage resource contention.
Observe that these mechanisms do not resolve the issue in versions 1.18.0 and later.
Expected Behavior: Multithreading mechanisms should effectively manage resource contention, preventing deadlocks and crashes.
Actual Behavior: Resource contention management mechanisms are ineffective in versions 1.18.0 and later, resulting in persistent deadlocks and crashes.
Environment:
ONNX Runtime DirectML Versions: 1.17.0 and earlier (issue resolved with queueing), 1.18.0 and later (issue persists)
Hardware: AMD GPU
Operating System: (windows10 or windows11)
Request for Assistance: Given my observations, there seems to be a resource contention issue, but I am not entirely certain of the underlying cause. Could you provide guidance or solutions for resolving this issue in the newer versions of onnxruntime-directml?
session= onnxruntime 的InferenceSession(onnx_model_path, providers= ['DmlExecutionProvider', 'CPUExecutionProvider']
"After loading the model onto the GPU, the issue of crashing occurs when calling session.run()."
Describe the issue
Issue Description:
In versions 1.17.0 and earlier of onnxruntime-directml, when using an AMD GPU and the onnxruntime.InferenceSession() method to load an ONNX model onto the GPU, a model session is created. If the program utilizes multithreading, multiple threads may compete for the model session, leading to deadlocks and crashes. Implementing a queue mechanism to avoid resource contention resolves the issue in these versions.
However, from version 1.18.0 onwards, despite using various mechanisms such as queueing, locks, and thread semaphores to limit resource contention in a multithreaded environment, these solutions have no effect. The problem persists, resulting in deadlocks and crashes.
Steps to Reproduce:
Use an AMD GPU.
Load an ONNX model using onnxruntime.InferenceSession() in a multithreaded program.
Observe deadlocks and crashes due to multiple threads competing for the model session.
Implement queueing, locks, and thread semaphores to manage resource contention.
Observe that these mechanisms do not resolve the issue in versions 1.18.0 and later.
Expected Behavior: Multithreading mechanisms should effectively manage resource contention, preventing deadlocks and crashes.
Actual Behavior: Resource contention management mechanisms are ineffective in versions 1.18.0 and later, resulting in persistent deadlocks and crashes.
Environment:
ONNX Runtime DirectML Versions: 1.17.0 and earlier (issue resolved with queueing), 1.18.0 and later (issue persists)
Hardware: AMD GPU
Operating System: (windows10 or windows11)
Request for Assistance: Given my observations, there seems to be a resource contention issue, but I am not entirely certain of the underlying cause. Could you provide guidance or solutions for resolving this issue in the newer versions of onnxruntime-directml?
Urgency
No response
Target platform
windows10 or windows11
Build script
session= onnxruntime.InferenceSession(onnx_model_path, providers= ['DmlExecutionProvider', 'CPUExecutionProvider'])
Error / output
The program deadlocks and crashes without generating any error messages or logs.
Visual Studio Version
No response
GCC / Compiler Version
No response