Closed doubleplusplus closed 3 months ago
Hi @doubleplusplus,
It looks like the gpt4all
package doesn't yet support native inference on GPUs (see https://docs.gpt4all.io/gpt4all_faq.html).
We're looking into some alternatives to support GPUs, because you're right - it's pretty slow to run these models on CPU only!
Hi everyone! Is there any advancement on this? I found out that gpt4all has received support from Vulkan (https://blog.nomic.ai/posts/gpt4all-gpu-inference-with-vulkan) but I have limited knowledge about GPU, so I didn't figure out if it can be integrated in OntoGPT.
You can do this with a little work. You have to change the llm plugin a little and make sure you have Vulkan dependencies installed.
For me with an Nvidia GPU on Ubuntu: apt-get install -y vulkan-tools libvulkan1 libnvidia-gl-535
find /root/.cache/pypoetry/virtualenvs -type f -name "llm_gpt4all.py" -print0 | xargs -0 sed -i "s/gpt_model = GPT4All(self.filename())/gpt_model = GPT4All(self.filename(), device='gpu')/"
gpt4all_client.py
might try to request the GPU when its already in use and your job will error out. Locking needs to be added.
This is now supported by #373 and ollama
- I've found the Docker ollama more reliably works with GPUs, but you may find it works out of the box.
on local model