pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.69k stars 288 forks source link

Thread/no of core setting for execution_runner #4668

Open ali-rehman-ML opened 1 month ago

ali-rehman-ML commented 1 month ago

📚 The doc issue

How to run the example execution_runner .exe after building it using cmake from the tutorial https://pytorch.org/executorch/stable/getting-started-setup.html, with multiple threads/cores, currently as I observe only single core is being used while execution, which causes slow inference. The documentation doesn't mention running prgram with multiple threads. Thank You.

Suggest a potential alternative/fix

No response

JacobSzwejbka commented 1 month ago

Are you looking for operators to use multiple threads internally or are you trying to parallelize the graph itself.

manuelcandales commented 1 month ago

The portable/quantized kernels don't support multiple threads. And as far as I know, the optimized kernels don't either. @JacobSzwejbka can we parallelize the graph itself?

JacobSzwejbka commented 1 month ago

can we parallelize the graph itself

ET itself wont inject any graph parallelization, but a skilled user could. The runtime supports it, sort of.

Delegates can do whatever they want internally, and at one point in time we had a proof of concept of a custom op that triggered an async graph and a second custom op that waited for the result. IDK if that code is still floating around in the repo though.

JacobSzwejbka commented 1 month ago

I know xnnpack delegate uses threads internally, I dont think they parallelize the graph though.