Closed pafik13 closed 7 months ago
@pafik13 Thanks for the detailed issue, it helped me a lot to investigate this problem :) I found the bug, I'll include the fix for it in the next beta
@giladgd can you point me to where you found the problem? if I can fix it locally, I can submit a PR.
:tada: This issue has been resolved in version 3.0.0-beta.2 :tada:
The release is available on:
v3.0.0-beta.2
Your semantic-release bot :package::rocket:
:tada: This issue has been resolved in version 3.0.0-beta.4 :tada:
The release is available on:
v3.0.0-beta.4
Your semantic-release bot :package::rocket:
Issue description
It seems to me that parameter
threads
doesn't work as expectedExpected Behavior
If I have 24 CPUs and pass
threads:24
then all CPUs should be utilized. II tried call originalllama.cpp
with argument-t 24
and it works normally as expected.Actual Behavior
I pass parameter
thread: 24 or 1
to constructor and nothing is changed: always it starts utilize 4 CPUs upper 80% and sometimes use 1-2 additional with 25-50% utilization.Steps to reproduce
Try to pass different
threads
value to model constructor and observe CPUs utilization (for example,htop
)My Environment
node-llama-cpp
versionAdditional Context
./llama.cpp/main -m ./catai/models/phind-codellama-34b-q3_k_s -p "Please, write JavaScript function to sort array" -ins -t 24
Relevant Features Used
Are you willing to resolve this issue by submitting a Pull Request?
Yes, I have the time, but I don't know how to start. I would need guidance.