Open 1over137 opened 8 months ago
seems GPU layers are only loaded on darwin and arm64
https://github.com/reorproject/reor/blob/main/electron/main/llm/models/LlamaCpp.ts#L111
using the GPT-4 API, it seems the projects relies on CPU usage to make vectorization while ignores entirely the GPU for the analysis and vectorization.
IDK if there is a way to use the GPU memory to improve the indexing time. ideas?
Working on this now. @1over137 @vorticalbox @ElCuboNegro Just for my info, do you guys have cuda installed? Which gpu do each of you use?
Sam, I have an nvidia 4090 capable of up to CUDA 12.3 and would be happy to test things out as well.
Indeed I don't have cuss installed. Do you think there is a way to make easier to final users this installation?
El 26 feb 2024 8:24 a. m., Daniel Houston @.***> escribió:
Sam, I have an nvidia 4090 capable of up to CUDA 12.3 and would be happy to test things out as well.
— Reply to this email directly, view it on GitHubhttps://github.com/reorproject/reor/issues/31#issuecomment-1964138810, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABKEV64PHHGBAEATS22PTWDYVSEJRAVCNFSM6AAAAABDJBRSIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRUGEZTQOBRGA. You are receiving this because you were mentioned.Message ID: @.***>
@DanielHouston @ElCuboNegro @vorticalbox @1over137 GPU support via CUDA is now out in the latest version!
You'll have to turn it on in Settings -> Hardware -> Toggle GPU and CUDA on.
More instructions are in the docs.
(There is also Vulkan support for AMD GPUs.)
The first prompt with a local LLM seems to take a long time (60~ seconds) but doesn't seem to be limited by CPU/RAM/GPU; however from the logs it is definitely using the GPU and after the initial prompt, I'm seeing good performance. Cheers!
Gotcha. Could you try to reboot Reor and see if you still experience that slowness? I suspect it'll just appear the first time you try to run with cuda...
After restarting the app, I have just now switched from a remote LLM to the pre-existing local LLM configuration and got the same slowness. the UI becomes unresponsive, although the cursor still blinks (so it's not on whatever thread you're rendering) Perhaps my CUDA installation? Anyone else experience the same?
On Linux, using an RTX 3090. It reports that 0/33 layers are offloaded to the GPU. I assume there is some missing switch required to make llama.cpp use the gpu?