Open WolframRhodium opened 5 months ago
CC: @gedoensmax
@hariharans29 If i understand correctly we can simply remove the sync in the GraphManager. The sync should be done in the EP iteself here: https://github.com/microsoft/onnxruntime/blob/c47a6ce70b80d5ca83e851d6ddfeab12af3e0941/onnxruntime/core/providers/cuda/cuda_execution_provider.cc#L434-L447
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
PR https://github.com/microsoft/onnxruntime/pull/14088 allows disabling EP synchronization at the end of session run. However, the cuda graph replay does not adhere to this flag https://github.com/microsoft/onnxruntime/blob/77b7619a3d81e619014bb714ece8b5e8c44f0788/onnxruntime/core/providers/cuda/cuda_graph.cc#L75-L86
To reproduce
With the C/C++ API, enable cuda graph from
OrtCUDAProviderOptionsV2
and disable ep synchronization in run options.Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18, 77b7619a3d81e619014bb714ece8b5e8c44f0788
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.3
Model File
No response
Is this a quantized model?
No