Open mistborn17 opened 1 year ago
Are you by any chance running CUDA/TensorRT EP ? In that case you could use CUDA streams and set the run option for synchronization to false. https://github.com/microsoft/onnxruntime/pull/14088
Would the workflow in that case be to use a asyncio.sleep
for the expected computation time ? If I understand it correctly, it won't be a coroutine or execute a callback.
Describe the feature request
In the python apis, I see run_async and also I see run_with_iobinding. Is it possible to use the async apis with io binding ?
Describe scenario use case
In a typical python server, having an efficient async apis wouldn't block the server main thread.