mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
634 stars 59 forks source link

Increase CPU usage cap from 400% to higher number, better if configurable #62

Open tuobulatuo opened 9 months ago

tuobulatuo commented 9 months ago

Hi Song Lab Team,

I noticed that the CPU usage is capped at 4 threads (400%). Can you please make it a configurable number? For example, if I have a machine with 10 CPUs, I can use all of them.

Thanks! Alex

RaymondWang0 commented 9 months ago

Hi @tuobulatuo, thanks for your valuable suggestion. We've made a quick fix to support a configurable number of threads for matrix multiplication on CPU platforms, as shown in README. You can now put this in the command line to configure different numbers of threads:

./chat <model_name> <precision> <num_threads>

I hope this helps. Thanks!