[BUG]: InferenceSession threads should default to number of performance cores

mikowals commented 3 months ago

Bug description

Manually adjusting run_max.py with engine.InferenceSession(10) got a speedup from 18.28 -> 25.82 QPS on roberta and from 29.15 -> 54.14 QPS on clip. My machine has 14 cores but only 10 performance cores so I think that accounts for the speedup.

So this could be done in run_max.py as I have done but the better option is for InferenceSession default to performance cores when no argument is provided.

Steps to reproduce

as above

System information

Docker Ubuntu 22.04 running on MacBook Pro
max 24.1.1 (0ab415f7)
Modular version 24.1.1-0ab415f7-release
modular 0.5.2 (6b3a04fd)

goldiegadde commented 2 months ago

@mikowals thanks for the bug report. We default to number of P cores on Linux x86 starting from MAX 24.2 release. For Ubuntu Docker on Mac ,you have to manually set the default cores to performance cores, MAX support for Mac is coming soon at which point, default cores will match P cores. Am going to close this issue now, please feel free to open if you have further questions.

mikowals commented 1 month ago

@goldiegadde, trying out Max on MacOS Apple Silicon with nightly and I still need to set performance cores manually. Makes performance go from "Oh, darn that's only 0.64x stock performance" to "That's about 1.06x faster" on roberta.

modularml / max