microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.22k stars 2.87k forks source link

[Feature Request] #17889

Open mistborn17 opened 11 months ago

mistborn17 commented 11 months ago

Describe the feature request

In the python apis, I see run_async and also I see run_with_iobinding. Is it possible to use the async apis with io binding ?

Describe scenario use case

In a typical python server, having an efficient async apis wouldn't block the server main thread.

gedoensmax commented 11 months ago

Are you by any chance running CUDA/TensorRT EP ? In that case you could use CUDA streams and set the run option for synchronization to false. https://github.com/microsoft/onnxruntime/pull/14088

mistborn17 commented 11 months ago

Would the workflow in that case be to use a asyncio.sleep for the expected computation time ? If I understand it correctly, it won't be a coroutine or execute a callback.