How to use with cuda? - Githubissues

withcatai / catai

Run AI ✨ assistant locally! with simple API for Node.js 🚀

https://withcatai.github.io/catai/

MIT License

440 stars 28 forks source link

How to use with cuda? #61

Closed sliterok closed 7 months ago

sliterok commented 8 months ago

Hi, I appreciate your work but I'm having a hard time understanding the specific actions required from me to run this UI with CUDA support on windows. Did I get the fact that I need to manually build node-llama-cpp with cuda support and put it to node_modules? It feels like a lot of pointless work and I'm not sure how other people are doing it if it's not in the readme... Did I miss something? I've tried to add gpuLayers in config but looks like it still is just using CPU... so there has to be some additional steps

sliterok commented 8 months ago

Okay, I was able to figure it out. I just built node-llama-cpp properly from docs and then I added catai as dependency to the same project that had llama-cpp and added this script into package.json

"scripts": {
  "up": "catai up"
},

Afterwards it can be just started with npm run up I feel like some part of it was unnecessary, hope someone can improve on that.

ido-pluto commented 8 months ago

I plan to add node-llama-cpp as a sub-cli to the catai cli so it will be simpler to configure cuda. In the meantime checkout this issue, https://github.com/withcatai/catai/issues/52#issuecomment-1781390466

It has a similar solution

ido-pluto commented 8 months ago

Try to install catai beta, and rebuild the binaries with cuda:

npm i -g catai@beta

catai cpp --cuda

https://github.com/withcatai/catai/blob/beta/docs/troubleshooting.md#cuda-support

Does this work for you?

sliterok commented 8 months ago

Does this work for you?

It does but smh also context size increased so 34B Q5 model doesnt fit into my 24G vram and "new" windows driver puts it to the shared memory. I don't think that performance was affected much, if at all tho

ido-pluto commented 8 months ago

It is related to changes happening in llama.cpp, it is currently in beta and more changes and optimization will come in the next versions