Open zwyao opened 3 months ago
sequential executor will generate one logic stream with each provider type, and ops in the same logic stream will run one by one, multi logic streams can run in parallel mode.
is sequential executor the only one executor in onnxruntime?
Every kernl also runs in parallel. A matrix multiplication is parallelized. The default is run kernels in sequence as every kernel tries to parallelized its computation on its own unless the user disable that behaviour.
There is no need to configure anything. By default, the execution of an individual op is parallelized. Additionally, you can call Run concurrently in multiple threads safely.
There is no need to configure anything. By default, the execution of an individual op is parallelized. Additionally, you can call Run concurrently in multiple threads safely.
thanks for your reply :) but i am confused with the concurrency mechanism. If i run a onnx model in a host with only CPU device, the kernels will be running one by one in topological order, even if i set concurrency mode. The only difference between ORT_SEQUENTIAL and ORT_PARALLEL is ORT_PARALLEL mode can use intra threads within a kernel. Is this correct?
Every kernl also runs in parallel. A matrix multiplication is parallelized. The default is run kernels in sequence as every kernel tries to parallelized its computation on its own unless the user disable that behaviour.
thanks for your reply: ) "Every kernl also runs in parallel" means the kernel run in intra thread pool?
There is no need to configure anything. By default, the execution of an individual op is parallelized. Additionally, you can call Run concurrently in multiple threads safely.
thanks for your reply :) but i am confused with the concurrency mechanism. If i run a onnx model in a host with only CPU device, the kernels will be running one by one in topological order, even if i set concurrency mode. The only difference between ORT_SEQUENTIAL and ORT_PARALLEL is ORT_PARALLEL mode can use intra threads within a kernel. Is this correct?
Topological order is the only way ORT executes the graph today. This is also called sequential execution mode a.k.a. the kernels are executed sequentially. The execution of each kernel is parallelized using an intra op thread pool. This and the fact that we allow concurrent execution of the model in multiple threads is all that is there to know about CPU concurrency in ORT. ORT_PARALLEL is deprecated.
There is no need to configure anything. By default, the execution of an individual op is parallelized. Additionally, you can call Run concurrently in multiple threads safely.
thanks for your reply :) but i am confused with the concurrency mechanism. If i run a onnx model in a host with only CPU device, the kernels will be running one by one in topological order, even if i set concurrency mode. The only difference between ORT_SEQUENTIAL and ORT_PARALLEL is ORT_PARALLEL mode can use intra threads within a kernel. Is this correct?
Topological order is the only way ORT executes the graph today. This is also called sequential execution mode a.k.a. the kernels are executed sequentially. The execution of each kernel is parallelized using an intra op thread pool. This and the fact that we allow concurrent execution of the model in multiple threads is all that is there to know about CPU concurrency in ORT. ORT_PARALLEL is deprecated.
thanks a lot.
There is no need to configure anything. By default, the execution of an individual op is parallelized. Additionally, you can call Run concurrently in multiple threads safely.
thanks for your reply :) but i am confused with the concurrency mechanism. If i run a onnx model in a host with only CPU device, the kernels will be running one by one in topological order, even if i set concurrency mode. The only difference between ORT_SEQUENTIAL and ORT_PARALLEL is ORT_PARALLEL mode can use intra threads within a kernel. Is this correct?
Topological order is the only way ORT executes the graph today. This is also called sequential execution mode a.k.a. the kernels are executed sequentially. The execution of each kernel is parallelized using an intra op thread pool. This and the fact that we allow concurrent execution of the model in multiple threads is all that is there to know about CPU concurrency in ORT. ORT_PARALLEL is deprecated.
thanks a lot.
Emmm
How to speed up inference on CPU device? If i use the following configuration:
-------->>> session_options.AddConfigEntry(kNodePartitionConfigFile, xxx) <<<--------
then i manually split these nodes to multi logic stream(it is very hard) Will this method improve inference performance?
How to speed up inference on CPU device?
Please profile the model first to understand which op is taking the most time.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
In ONNX Runtime v1.18.1, it can set option with ExecutionMode::ORT_PARALLEL, it means ops will run in parallel mode, but i cant find any executors about multi-thread, it only have one sequential executor, am i wrong? How does onnxruntime run in parallel mode?
To reproduce
no step
Urgency
No response
Platform
Linux
OS Version
centos
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.1 master
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Other / Unknown
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Unknown