Is there an option to use GPU for the LLM, the inference speed is a little too slow. My GPU utilization is almost zero.

monarch-initiative / ontogpt

LLM-based ontological extraction tools, including SPIRES

https://monarch-initiative.github.io/ontogpt/

BSD 3-Clause "New" or "Revised" License

603 stars 75 forks source link

Is there an option to use GPU for the LLM, the inference speed is a little too slow. My GPU utilization is almost zero. #267

Closed doubleplusplus closed 3 months ago

doubleplusplus commented 11 months ago

on local model

caufieldjh commented 11 months ago

Hi @doubleplusplus, It looks like the gpt4all package doesn't yet support native inference on GPUs (see https://docs.gpt4all.io/gpt4all_faq.html). We're looking into some alternatives to support GPUs, because you're right - it's pretty slow to run these models on CPU only!

TribeDH commented 9 months ago

Hi everyone! Is there any advancement on this? I found out that gpt4all has received support from Vulkan (https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan) but I have limited knowledge about GPU, so I didn't figure out if it can be integrated in OntoGPT.

durabledata commented 8 months ago

You can do this with a little work. You have to change the llm plugin a little and make sure you have Vulkan dependencies installed.

For me with an Nvidia GPU on Ubuntu: apt-get install -y vulkan-tools libvulkan1 libnvidia-gl-535

find /root/.cache/pypoetry/virtualenvs -type f -name "llm_gpt4all.py" -print0 | xargs -0 sed -i "s/gpt_model = GPT4All(self.filename())/gpt_model = GPT4All(self.filename(), device='gpu')/"

durabledata commented 7 months ago

gpt4all_client.py might try to request the GPU when its already in use and your job will error out. Locking needs to be added.

caufieldjh commented 3 months ago

This is now supported by #373 and ollama - I've found the Docker ollama more reliably works with GPUs, but you may find it works out of the box.