Closed billyma128 closed 1 month ago
Ollama doesn't support Vulkan, so it doesn't use it.
Can you please try to run the command with both --gpu false
and --gpu vulkan
and report whether there is any difference in inference speed?
I'm trying to figure out whether the Vulkan device that is used on your machine is CPU or is it a GPU.
Also, can you please run this command and share its results?
npx --yes node-llama-cpp@beta inspect gpu
Closing due to inactivity.
If you still encounter issues with node-llama-cpp
, let me know and I'll try to help.
Issue description
There is an error when generating response, which looks like Vulkan related issue. But ollama run the same model , which works well, Thanks for your time! Best Regarded!
Expected Behavior
I use node-llama-cpp in my real world project, which works very well until I get one particular laptop. I reproduce the error using cli below:
npx --no node-llama-cpp chat --model models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
load model successful and generating response successfulActual Behavior
npx --no node-llama-cpp chat --model models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
load model successful but generating response throw an error and exitSteps to reproduce
npx --no node-llama-cpp chat --model models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
My Environment
node-llama-cpp
versionAdditional Context
I have upgraded the driver version of Intel iris graphics to mostly stable version: 31.0.101.2128 and restarted the computer
Relevant Features Used
Are you willing to resolve this issue by submitting a Pull Request?
Yes, I have the time, but I don't know how to start. I would need guidance.