Open cbourjau opened 1 year ago
The batch dimension is usually the first one but not necessarily. The matrix can be transposed. Each kernel is able to parallelize the computation but the strategy can be different based on the input dimensions. It is not necessarily parallelized on the first dimension. There is no kernel parallelization if intra_op_num_threads == 1. Some of the parameters are described at https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.intra_op_num_threads.
Describe the documentation issue
Some models have a "batch dimension" in their inputs suggesting that entries along that dimension are independent of each other. Models of this kind are good candidates for embarrassingly parallel execution: Simply chunk the inputs along that dimension, execute each chunk in its own thread, and lastly concatenate the outputs. A simple parallelization model of this kind can better utilize the available hardware in some use-cases.
While onnxruntime has two prominent options for parallelization (
intra_op_num_threads
andinter_op_num_threads
of theSessionOptions
object) I did not find any documentation of this kind of parallelization. It appears to me that an embarrassingly parallel approach would have significant advantages over the aforementioned options as I understand them. Did I miss the possibility to somehow communicate to onnxruntime that a certain dimension is a batch dimension to be used for parallelization, or does that feature simply not exist?Page / URL
No response