Doesn't fully use CPU? - Githubissues

mnn commented 1 year ago

I have noticed it never uses more than 400% in glances (equivalent of 4 cores at 100%), that's only 25% of what my CPU has to offer. Is it normal, or do I have something configured wrong?

$ cat chatdocs.yml
llm: ctransformers

ctransformers:
  model: /mnt/dev/ai/oobabooga_linux/text-generation-webui/models/Wizard-Vicuna-7B-Uncensored/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin
  model_type: llama

download: false

I tried other (ggml) models, but it behaved pretty much same.

I remember from trying oobabooga's text gen webui that "Transformers" had similar poor performance and I had to switch to "llama.cpp" to get better CPU utilization.

Sadly for my GPU is still not available rocm (5.5?) in Manajro, so CPU is currently my only option :(.

94bb494nd41f commented 1 year ago

you can specifiy the amount of threads by adding threads: 12, see https://github.com/marella/ctransformers#documentation

mnn commented 1 year ago

Thanks for the tip :slightly_smiling_face:, after few attempts (I am very green to this ecosystem) following config works nicely:

llm: ctransformers

ctransformers:
  model: /mnt/dev/ai/oobabooga_linux/text-generation-webui/models/Wizard-Vicuna-7B-Uncensored/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin
  model_type: llama
  config:
    threads: 14

download: false

I guess the default value -1 doesn't mean auto-detect, or the auto-detect is broken (it doesn't seem to be rewritten by default config in chatdocs).

marella / chatdocs

Doesn't fully use CPU? #31