reorproject / reor

Private & local AI personal knowledge management app for high entropy thinkers.
https://reorproject.org
GNU Affero General Public License v3.0
7.08k stars 436 forks source link

Not using GPU #31

Open 1over137 opened 8 months ago

1over137 commented 8 months ago

On Linux, using an RTX 3090. It reports that 0/33 layers are offloaded to the GPU. I assume there is some missing switch required to make llama.cpp use the gpu?

vorticalbox commented 8 months ago

seems GPU layers are only loaded on darwin and arm64

https://github.com/reorproject/reor/blob/main/electron/main/llm/models/LlamaCpp.ts#L111

ElCuboNegro commented 8 months ago

using the GPT-4 API, it seems the projects relies on CPU usage to make vectorization image while ignores entirely the GPU for the analysis and vectorization.

IDK if there is a way to use the GPU memory to improve the indexing time. ideas?

image

samlhuillier commented 8 months ago

Working on this now. @1over137 @vorticalbox @ElCuboNegro Just for my info, do you guys have cuda installed? Which gpu do each of you use?

DanielHouston commented 8 months ago

Sam, I have an nvidia 4090 capable of up to CUDA 12.3 and would be happy to test things out as well.

ElCuboNegro commented 8 months ago

Indeed I don't have cuss installed. Do you think there is a way to make easier to final users this installation?

El 26 feb 2024 8:24 a. m., Daniel Houston @.***> escribió:

Sam, I have an nvidia 4090 capable of up to CUDA 12.3 and would be happy to test things out as well.

— Reply to this email directly, view it on GitHubhttps://github.com/reorproject/reor/issues/31#issuecomment-1964138810, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABKEV64PHHGBAEATS22PTWDYVSEJRAVCNFSM6AAAAABDJBRSIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRUGEZTQOBRGA. You are receiving this because you were mentioned.Message ID: @.***>

samlhuillier commented 8 months ago

@DanielHouston @ElCuboNegro @vorticalbox @1over137 GPU support via CUDA is now out in the latest version!

You'll have to turn it on in Settings -> Hardware -> Toggle GPU and CUDA on.

More instructions are in the docs.

(There is also Vulkan support for AMD GPUs.)

DanielHouston commented 8 months ago

The first prompt with a local LLM seems to take a long time (60~ seconds) but doesn't seem to be limited by CPU/RAM/GPU; however from the logs it is definitely using the GPU and after the initial prompt, I'm seeing good performance. Cheers!

samlhuillier commented 8 months ago

Gotcha. Could you try to reboot Reor and see if you still experience that slowness? I suspect it'll just appear the first time you try to run with cuda...

DanielHouston commented 8 months ago

After restarting the app, I have just now switched from a remote LLM to the pre-existing local LLM configuration and got the same slowness. the UI becomes unresponsive, although the cursor still blinks (so it's not on whatever thread you're rendering) Perhaps my CUDA installation? Anyone else experience the same?