microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.26k stars 2.87k forks source link

[Performance] Disable stream synchronization at the end of CUDA Graphs replay #20392

Open WolframRhodium opened 5 months ago

WolframRhodium commented 5 months ago

Describe the issue

PR https://github.com/microsoft/onnxruntime/pull/14088 allows disabling EP synchronization at the end of session run. However, the cuda graph replay does not adhere to this flag https://github.com/microsoft/onnxruntime/blob/77b7619a3d81e619014bb714ece8b5e8c44f0788/onnxruntime/core/providers/cuda/cuda_graph.cc#L75-L86

To reproduce

With the C/C++ API, enable cuda graph from OrtCUDAProviderOptionsV2 and disable ep synchronization in run options.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18, 77b7619a3d81e619014bb714ece8b5e8c44f0788

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.3

Model File

No response

Is this a quantized model?

No

hariharans29 commented 5 months ago

CC: @gedoensmax

gedoensmax commented 5 months ago

@hariharans29 If i understand correctly we can simply remove the sync in the GraphManager. The sync should be done in the EP iteself here: https://github.com/microsoft/onnxruntime/blob/c47a6ce70b80d5ca83e851d6ddfeab12af3e0941/onnxruntime/core/providers/cuda/cuda_execution_provider.cc#L434-L447

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.