Open ali-rehman-ML opened 1 month ago
Are you looking for operators to use multiple threads internally or are you trying to parallelize the graph itself.
The portable/quantized kernels don't support multiple threads. And as far as I know, the optimized kernels don't either. @JacobSzwejbka can we parallelize the graph itself?
can we parallelize the graph itself
ET itself wont inject any graph parallelization, but a skilled user could. The runtime supports it, sort of.
Delegates can do whatever they want internally, and at one point in time we had a proof of concept of a custom op that triggered an async graph and a second custom op that waited for the result. IDK if that code is still floating around in the repo though.
I know xnnpack delegate uses threads internally, I dont think they parallelize the graph though.
📚 The doc issue
How to run the example execution_runner .exe after building it using cmake from the tutorial https://pytorch.org/executorch/stable/getting-started-setup.html, with multiple threads/cores, currently as I observe only single core is being used while execution, which causes slow inference. The documentation doesn't mention running prgram with multiple threads. Thank You.
Suggest a potential alternative/fix
No response