Closed mnn closed 1 year ago
you can specifiy the amount of threads by adding threads: 12
, see https://github.com/marella/ctransformers#documentation
Thanks for the tip :slightly_smiling_face:, after few attempts (I am very green to this ecosystem) following config works nicely:
llm: ctransformers
ctransformers:
model: /mnt/dev/ai/oobabooga_linux/text-generation-webui/models/Wizard-Vicuna-7B-Uncensored/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin
model_type: llama
config:
threads: 14
download: false
I guess the default value -1
doesn't mean auto-detect, or the auto-detect is broken (it doesn't seem to be rewritten by default config in chatdocs).
I have noticed it never uses more than 400% in glances (equivalent of 4 cores at 100%), that's only 25% of what my CPU has to offer. Is it normal, or do I have something configured wrong?
I tried other (ggml) models, but it behaved pretty much same.
I remember from trying oobabooga's text gen webui that "Transformers" had similar poor performance and I had to switch to "llama.cpp" to get better CPU utilization.
Sadly for my GPU is still not available rocm (5.5?) in Manajro, so CPU is currently my only option :(.