microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.87k stars 2.94k forks source link

What is the recommended setup for running multiple models/sessions in parallel in C++? #18610

Open vymao opened 1 year ago

vymao commented 1 year ago

Describe the issue

I want to run 2 or 3 ONNX models in C++ simultaneously. The only way that seems recommended to do this is to create one session per model, each in a different thread or process. Then we somehow map the thread affinities to particular cores to avoid contention.

I'm not 100% sure how to do this, but based on what I could fine, I assume the following:

  1. I create one process per model
  2. Within each process, I create a new Ort::Env, Ort::SessionOptions, and Ort::Session
  3. I set the threading options and affinities within Ort::Env like so. The only way I have found in doing this is to use SetGlobalIntraOpThreadAffinity, which seems to indicate that I need to create separate Ort::Envs to do this globally within the env. I did not find any affinity methods in the Ort::SessionOptions API.

Is this correct? And is there a more efficient way of doing this? Some Stack Overflow posts (like this) and other related issues (like this) suggest that it is possible to create a global threadpool shared across sessions, but I'm not sure how to configure that, or if it is more efficient.

To reproduce

N/A

Urgency

No response

Platform

Mac

OS Version

13.5

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.2

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

pranavsharma commented 12 months ago

This should answer your question https://github.com/microsoft/onnxruntime/blob/2c50b75a26429ef3146d1c6c541f3a3112aa7c83/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h#L186-L201.

vymao commented 12 months ago

Thanks, but I still wonder about the second part of my question: Some Stack Overflow posts (like this) and other related issues (https://github.com/microsoft/onnxruntime/issues/12654) suggest that it is possible to create a global threadpool shared across sessions, but I wonder which approach is more efficient.

vymao commented 11 months ago

Hi @pranavsharma, just following up on this?

pfeatherstone commented 8 months ago

It would be nice if you could access the global thread pool inside Onnxruntime and wrap a task system around it using something like https://github.com/Naios/continuable Then you could run all your ONNX models concurrently in parallel using a single shared thread pool. That would be sweet

ben-da6 commented 8 months ago

I'd be interested in hearing a more complete answer to this question, is this the recommended way of dealing with multiple models?

pfeatherstone commented 8 months ago

It was just a recommendation. I use the onnxruntime shared thread pool for all my models, but I want to go a step further and launch both models asynchronously on that same shared pool as well. currently, you have to launch them on your own threads.

pfeatherstone commented 8 months ago

I've just noticed that Ort::Session now has the RunAsync() function which runs on ORT's intra-op thread pool. That might solve your problem

pfeatherstone commented 8 months ago

it would still be cool if you could explicitly access that thread pool and roll out your own task system so your whole program uses a single pool shared by all models and any other concurrent code you need to run.