Open CodeMazeSolver opened 1 month ago
I am trying to deploy localai to a ubuntu 24.04 server (proxmox vm) with A2000 passed through and think I am running into the same issue. I initially had 550 drivers installed on the server which corresponded to what I see in nvidia-smi when localai starts, but have also purged nvidia drivers and put the server back to 535 drivers. Regardless I get this message in the lags when attempting to use a GPU model: INF GPU device found but no CUDA backend present
.
I have tried images tagged master-aio-gpu-nvidia-cuda-12, master-aio-gpu-nvidia-cuda-11, master-cublas-cuda12-ffmpeg, and have also tried with the env variable REBUILD=true. I am currently able to run v2.15.0-aio-gpu-nvidia-cuda-12
and everything seems to work. I haven't tried any builds between that one and the current one.
Same here, I tried the docker version without success
confirm, only cpu docker 2.16 ubuntu 24.04 nvidia 550 INF GPU device found but no CUDA backend present
Same issue. spent over 2 days trying to figure out what happened till I found this issue. Installing an older version of NVIDIA driver from https://www.nvidia.com/download/index.aspx?lang=en-us fixed the issue. Specifically, I downloaded and installed the 551.86 driver
Error : INF GPU device found but no CUDA backend present.
I think , I had found the reason!
IF YOU DID NOT MAKE TO DIST, llama-cpp-cuda WILL NOT INCLUDED in backend!
New version has changed backends so much. And did not update the documents.
New version 's make file, here dist: STATIC=true $(MAKE) backend-assets/grpc/llama-cpp-avx2 ifeq ($(OS),Darwin) $(info ${GREEN}I Skip CUDA build on MacOS${RESET}) else $(MAKE) backend-assets/grpc/llama-cpp-cuda endif $(MAKE) build mkdir -p release
I can confirm.. Running in WSL 2. I tried rebuilding from source and during the build it states that CUDA was found but falls back to AVX2 when loading model. Downgrading the drivers to 551.86 "fixes" the issue.
Same issue. spent over 2 days trying to figure out what happened till I found this issue. Installing an older version of NVIDIA driver from https://www.nvidia.com/download/index.aspx?lang=en-us fixed the issue. Specifically, I downloaded and installed the 551.86 driver
Yes, I also moved to an older version of the driver for now.
Upgrade to latest CUDA toolkit can fix it.
Driver Version: 555.42.02 CUDA Version: 12.5
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
@Phate334
Will you elaborate a bit on your setup (OS, repos nvidia drivers installed from) because I have exactly the same CUDA as yours, but still no joy?
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
apt-cache policy nvidia-driver-555
nvidia-driver-555:
Installed: 555.42.02-0ubuntu1
Candidate: 555.42.02-0ubuntu1
Version table:
*** 555.42.02-0ubuntu1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 Packages
100 /var/lib/dpkg/status
555.42.02-0ubuntu0~gpu22.04.1 500
500 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy/main amd64 Packages
Ubuntu 22.04
apt-cache policy nvidia-driver-555 ...
I use runfile to install toolkit instead of apt. LocalAI v2.16.0
@Phate334
I have the 555.42 and CUDA 12.5 running and working everywhere except for localai.
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
returns:
Sun Jun 9 12:54:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A2000 Off | 00000000:00:10.0 Off | Off |
| 30% 32C P8 4W / 70W | 2MiB / 6138MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
nvcc -V
returns:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
nvidia-smi
returns:
nvidia-smi
Sun Jun 9 12:57:01 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A2000 Off | 00000000:00:10.0 Off | Off |
| 30% 32C P8 4W / 70W | 2MiB / 6138MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Now with all of that in place and a reboot of the system for good measure I am getting this when running localai/localai:master-aio-gpu-nvidia-cuda-12
===> LocalAI All-in-One (AIO) container starting...
NVIDIA GPU detected
/aio/entrypoint.sh: line 52: nvidia-smi: command not found
NVIDIA GPU detected, but nvidia-smi is not installed. GPU acceleration will not be available.
GPU acceleration is not enabled or supported. Defaulting to CPU.
@nmbgeek
I'm on Ubuntu 22.04 After fresh reinstall everything connected to cuda (including nvidia-driver-555) with the following:
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
$ nvidia-smi
Sun Jun 9 17:46:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3070 ... Off | 00000000:01:00.0 On | N/A |
| N/A 59C P0 28W / 80W | 7044MiB / 8192MiB | 3% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 6416 G /usr/lib/xorg/Xorg 381MiB |
| 0 N/A N/A 6924 G /usr/bin/gnome-shell 96MiB |
| 0 N/A N/A 326812 G ...SidePanel --variations-seed-version 349MiB |
| 0 N/A N/A 351309 G x-terminal-emulator 10MiB |
| 0 N/A N/A 352079 C .../backend-assets/grpc/llama-cpp-avx2 6152MiB |
+-----------------------------------------------------------------------------------------+
everything works just fine.
Versions I've got:
$ apt-cache policy nvidia-driver-555
nvidia-driver-555:
Installed: 555.42.02-0ubuntu1
Candidate: 555.42.02-0ubuntu1
Version table:
555.52.04-0ubuntu0~gpu22.04.1 500
500 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy/main amd64 Packages
*** 555.42.02-0ubuntu1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 Packages
100 /var/lib/dpkg/status
$ apt-cache policy cuda
cuda:
Installed: 12.5.0-1
Candidate: 12.5.0-1
Version table:
*** 12.5.0-1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64 Packages
100 /var/lib/dpkg/status
12.5.0-1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 Packages
12.4.1-1 600
600 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64 Packages
12.4.1-1 600
Also have the "GPU device found but no CUDA backend present" issue:
docker exec -it localai bash
root@localai:/build# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
root@localai:/build# nvidia-smi
Tue Jun 18 10:39:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 6000 Ada Gene... Off | 00000000:55:00.0 Off | Off |
| 30% 36C P8 28W / 300W | 7024MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
for a reason in auto detection does not call the following from makefile i dont know where and how the autodetection happens
https://github.com/mudler/LocalAI/blob/master/Makefile
backend-assets/grpc/llama-cpp-cuda: backend-assets/grpc
cp -rf backend/cpp/llama backend/cpp/llama-cuda
$(MAKE) -C backend/cpp/llama-cuda purge
$(info ${GREEN}I llama-cpp build info:cuda${RESET})
CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA=ON" $(MAKE) VARIANT="llama-cuda" build-llama-cpp-grpc-server
cp -rfv backend/cpp/llama-cuda/grpc-server backend-assets/grpc/llama-cpp-cuda
to make it work temporarily i used
image: localai/localai:master-cublas-cuda12-ffmpeg
and put in variables the following to force compiling llama-cuda
environment:
#had to set the following for llamma.cpp to make llama-cuda
- BUILD_GRPC_FOR_BACKEND_LLAMA=true
- VARIANT=llama-cuda
- GRPC_BACKENDS=backend-assets/grpc/llama-cpp-cuda
- REBUILD=true
- BUILD_TYPE=cublas
note: nvidia-smi both host and inside the container use cuda 12.5
hope helps someone to find a way to autodetect and make the above at the normal rebuild process or in the predefined container
Thanks, @MCP-LTS, this made CUDA work for me inside the LocalAI container!
Now we just need this to be fixed inside the official images since rebuilding took hours on my (actually pretty beefy) AI server.
Thank you @MCP-LTS, this also works on with v2.17.1.
I found another Github issue for Ollama that seems to be related. https://github.com/ollama/ollama/issues/4563#issuecomment-2132940376.
Seems that the new Nvidia driver doesn't load the necessary kernel module in Linux. I have not tested this out with LocalAI yet on my Linux deployments.
I also run LocalAI on Windows WSL2 with Docker Desktop and was having the same issue. In the same thread as before it mentions updates to Docker Dektop.
I updated Docker Desktop to 4.31.1 (https://docs.docker.com/desktop/release-notes/) and it finally works with the latest drivers (555.99)
So... For anyone out there that is running this in WSL2, try updating Docker Desktop.
@MCP-LTS I followed your workaround and the build seemed to succeed. However when I try to chat with a model I get the following error:
12:18AM INF [llama-cpp] attempting to load with CUDA variant
12:18AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-cuda
12:18AM DBG GRPC(Mistral-7B-Instruct-v0.3.Q4_K_M.gguf-127.0.0.1:34409): stderr /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-cuda: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
Running LocalAI on a K8S node.
On the node:
➜ ~ ls -l /usr/lib64/libcuda.so.1
lrwxrwxrwx 1 root root 20 May 15 11:53 /usr/lib64/libcuda.so.1 -> libcuda.so.555.42.02
In the container:
root@localai-local-ai-649fd7f4bd-rqtlj:/build# find /usr | grep libcuda
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config-version.cmake
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config.cmake
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-header-search.cmake
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudadevrt.a
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart.so.12
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart.so.12.5.39
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-12.5/targets/x86_64-linux/lib/stubs/libcuda.so
Any suggestions?
I can't successfully build llama-cuda inside the container. I prefer it to be delivered precompiled within the images. When will this be fixed?
To avoid repeated rebuild I use this dockerfile to build a new image and it works.
FROM localai/localai:v2.17.1-cublas-cuda12-ffmpeg
ENV BUILD_GRPC_FOR_BACKEND_LLAMA=true
ENV VARIANT=llama-cuda
ENV GRPC_BACKENDS=backend-assets/grpc/llama-cpp-cuda
ENV REBUILD=true
ENV BUILD_TYPE=cublas
RUN cd /build && rm -rf ./local-ai && make build -j${BUILD_PARALLELISM:-1}
Hey there, I'm running
LocalAI version:
docker run --rm -ti --gpus all -p 8080:8080 -e DEBUG=true -v $PWD/models:/models --name local-ai localai/localai:latest-aio-gpu-nvidia-cuda-12 --models-path /models --context-size 1000 --threads 14
LocalAI version: v2.15.0 (f69de3be0d274a676f1d1cd302dc4699f1b5aaf0)
Environment, CPU architecture, OS, and Version:
13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz, on Windows 11 with Docker for Windows.
Describe the bug
I get this debug message right before the model is loaded.
stderr ggml_cuda_init: failed to initialize CUDA: named symbol not found
Which indicated to me that the models will not use GPU support. However, this worked just fine before.
After updating the graphics driver, the CUDA version was changed, too. From CUDA version 12.4 to 12.5. It seems like the CUDA environment is no longer used by any LLM. However, the CUDA version is detected correctly when starting the LocalAI Docker container.
Instead of utilizing the GPU, the application uses the fallback and runs only on the CPU.
To Reproduce
Expected behavior
Utilizing the GPU.
Logs
Here are the full logs for the
mistral-7b-instruct-v0.1.Q5_K_M.gguf
model, but I tried several models that worked before. None utilize the GPU after installing the new graphics driver.localai.log
Additional context
Also checking in the task manager shows that there is no GPU usage taking place.