mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
https://localai.io
MIT License
21.59k stars 1.65k forks source link

CUDA 12.5 support or GPU acceleration not working after graphics driver update #2394

Open CodeMazeSolver opened 1 month ago

CodeMazeSolver commented 1 month ago

Hey there, I'm running

LocalAI version:

docker run --rm -ti --gpus all -p 8080:8080 -e DEBUG=true -v $PWD/models:/models --name local-ai localai/localai:latest-aio-gpu-nvidia-cuda-12 --models-path /models --context-size 1000 --threads 14

LocalAI version: v2.15.0 (f69de3be0d274a676f1d1cd302dc4699f1b5aaf0)

Environment, CPU architecture, OS, and Version:

13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz, on Windows 11 with Docker for Windows.

Describe the bug

I get this debug message right before the model is loaded.

stderr ggml_cuda_init: failed to initialize CUDA: named symbol not found

Which indicated to me that the models will not use GPU support. However, this worked just fine before.

After updating the graphics driver, the CUDA version was changed, too. From CUDA version 12.4 to 12.5. It seems like the CUDA environment is no longer used by any LLM. However, the CUDA version is detected correctly when starting the LocalAI Docker container.

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090 ...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   48C    P8              7W /  105W |     148MiB /  16376MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
NVIDIA GPU detected. Attempting to find memory size...
Total GPU Memory: 16376 MiB

Instead of utilizing the GPU, the application uses the fallback and runs only on the CPU.

To Reproduce

Expected behavior

Utilizing the GPU.

Logs

Here are the full logs for the mistral-7b-instruct-v0.1.Q5_K_M.gguf model, but I tried several models that worked before. None utilize the GPU after installing the new graphics driver.

localai.log

Additional context

Also checking in the task manager shows that there is no GPU usage taking place.

nmbgeek commented 1 month ago

I am trying to deploy localai to a ubuntu 24.04 server (proxmox vm) with A2000 passed through and think I am running into the same issue. I initially had 550 drivers installed on the server which corresponded to what I see in nvidia-smi when localai starts, but have also purged nvidia drivers and put the server back to 535 drivers. Regardless I get this message in the lags when attempting to use a GPU model: INF GPU device found but no CUDA backend present.

I have tried images tagged master-aio-gpu-nvidia-cuda-12, master-aio-gpu-nvidia-cuda-11, master-cublas-cuda12-ffmpeg, and have also tried with the env variable REBUILD=true. I am currently able to run v2.15.0-aio-gpu-nvidia-cuda-12 and everything seems to work. I haven't tried any builds between that one and the current one.

Hideman85 commented 1 month ago

Same here, I tried the docker version without success

Logs here ``` docker run -p 8080:8080 --rm -v ./Documents/AIModels/:/build/models -ti localai/localai:latest-aio-gpu-nvidia-cuda-12 ===> LocalAI All-in-One (AIO) container starting... NVIDIA GPU detected /aio/entrypoint.sh: line 52: nvidia-smi: command not found NVIDIA GPU detected, but nvidia-smi is not installed. GPU acceleration will not be available. AMD GPU detected AMD GPU detected, but ROCm is not installed. GPU acceleration will not be available. GPU acceleration is not enabled or supported. Defaulting to CPU. [...] 10:27AM INF core/startup process completed! 10:27AM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080 10:27AM INF Success ip=172.17.0.1 latency=29.872297ms method=POST status=200 url=/v1/chat/completions 10:27AM INF Trying to load the model 'b5869d55688a529c3738cb044e92c331' with the backend '[llama-cpp llama-ggml gpt4all llama-cpp-fallback piper stablediffusion rwkv whisper huggingface bert-embeddings /build/backend/python/bark/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/transformers/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/exllama/run.sh /build/backend/python/exllama2/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/transformers-musicgen/run.sh /build/backend/python/coqui/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/vllm/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/mamba/run.sh /build/backend/python/parler-tts/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/petals/run.sh]' 10:27AM INF [llama-cpp] Attempting to load 10:27AM INF Loading model 'b5869d55688a529c3738cb044e92c331' with backend llama-cpp 10:27AM INF GPU device found but no CUDA backend present 10:27AM INF [llama-cpp] attempting to load with AVX2 variant ```
madgagarin commented 1 month ago

confirm, only cpu docker 2.16 ubuntu 24.04 nvidia 550 INF GPU device found but no CUDA backend present

stephenleo commented 1 month ago

Same issue. spent over 2 days trying to figure out what happened till I found this issue. Installing an older version of NVIDIA driver from https://www.nvidia.com/download/index.aspx?lang=en-us fixed the issue. Specifically, I downloaded and installed the 551.86 driver

crazymxm commented 1 month ago

Error : INF GPU device found but no CUDA backend present.

I think , I had found the reason!

IF YOU DID NOT MAKE TO DIST, llama-cpp-cuda WILL NOT INCLUDED in backend!

New version has changed backends so much. And did not update the documents.

New version 's make file, here dist: STATIC=true $(MAKE) backend-assets/grpc/llama-cpp-avx2 ifeq ($(OS),Darwin) $(info ${GREEN}I Skip CUDA build on MacOS${RESET}) else $(MAKE) backend-assets/grpc/llama-cpp-cuda endif $(MAKE) build mkdir -p release

jobongo commented 1 month ago

I can confirm.. Running in WSL 2. I tried rebuilding from source and during the build it states that CUDA was found but falls back to AVX2 when loading model. Downgrading the drivers to 551.86 "fixes" the issue.

CodeMazeSolver commented 1 month ago

Same issue. spent over 2 days trying to figure out what happened till I found this issue. Installing an older version of NVIDIA driver from https://www.nvidia.com/download/index.aspx?lang=en-us fixed the issue. Specifically, I downloaded and installed the 551.86 driver

Yes, I also moved to an older version of the driver for now.

Phate334 commented 4 weeks ago

Upgrade to latest CUDA toolkit can fix it.

Driver Version: 555.42.02 CUDA Version: 12.5

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
sbushmanov commented 3 weeks ago

@Phate334

Will you elaborate a bit on your setup (OS, repos nvidia drivers installed from) because I have exactly the same CUDA as yours, but still no joy?

$ nvcc -V 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
apt-cache policy nvidia-driver-555
nvidia-driver-555:
  Installed: 555.42.02-0ubuntu1
  Candidate: 555.42.02-0ubuntu1
  Version table:
 *** 555.42.02-0ubuntu1 600
        600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages
        100 /var/lib/dpkg/status
     555.42.02-0ubuntu0~gpu22.04.1 500
        500 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy/main amd64 Packages

Ubuntu 22.04

Phate334 commented 3 weeks ago
apt-cache policy nvidia-driver-555
...

I use runfile to install toolkit instead of apt. LocalAI v2.16.0

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local

nmbgeek commented 3 weeks ago

@Phate334

I have the 555.42 and CUDA 12.5 running and working everywhere except for localai. sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi returns:

Sun Jun  9 12:54:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A2000               Off |   00000000:00:10.0 Off |                  Off |
| 30%   32C    P8              4W /   70W |       2MiB /   6138MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

nvcc -V returns:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

nvidia-smi returns:

nvidia-smi
Sun Jun  9 12:57:01 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A2000               Off |   00000000:00:10.0 Off |                  Off |
| 30%   32C    P8              4W /   70W |       2MiB /   6138MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Now with all of that in place and a reboot of the system for good measure I am getting this when running localai/localai:master-aio-gpu-nvidia-cuda-12

===> LocalAI All-in-One (AIO) container starting...
NVIDIA GPU detected
/aio/entrypoint.sh: line 52: nvidia-smi: command not found
NVIDIA GPU detected, but nvidia-smi is not installed. GPU acceleration will not be available.
GPU acceleration is not enabled or supported. Defaulting to CPU.
sbushmanov commented 3 weeks ago

@nmbgeek

I'm on Ubuntu 22.04 After fresh reinstall everything connected to cuda (including nvidia-driver-555) with the following:

$ nvcc -V     
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

$ nvidia-smi                                                                                                                                                 
Sun Jun  9 17:46:40 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   59C    P0             28W /   80W |    7044MiB /   8192MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      6416      G   /usr/lib/xorg/Xorg                            381MiB |
|    0   N/A  N/A      6924      G   /usr/bin/gnome-shell                           96MiB |
|    0   N/A  N/A    326812      G   ...SidePanel --variations-seed-version        349MiB |
|    0   N/A  N/A    351309      G   x-terminal-emulator                            10MiB |
|    0   N/A  N/A    352079      C   .../backend-assets/grpc/llama-cpp-avx2       6152MiB |
+-----------------------------------------------------------------------------------------+

everything works just fine.

Versions I've got:

$ apt-cache policy nvidia-driver-555                                                                                                                                                       
nvidia-driver-555:
  Installed: 555.42.02-0ubuntu1
  Candidate: 555.42.02-0ubuntu1
  Version table:
     555.52.04-0ubuntu0~gpu22.04.1 500
        500 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy/main amd64 Packages
 *** 555.42.02-0ubuntu1 600
        600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages
        100 /var/lib/dpkg/status

$ apt-cache policy cuda                                                                                                                                                                    
cuda:
  Installed: 12.5.0-1
  Candidate: 12.5.0-1
  Version table:
 *** 12.5.0-1 600
        600 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64  Packages
        100 /var/lib/dpkg/status
     12.5.0-1 600
        600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages
     12.4.1-1 600
        600 https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64  Packages
     12.4.1-1 600
StefanDanielSchwarz commented 2 weeks ago

Also have the "GPU device found but no CUDA backend present" issue:


docker exec -it localai bash

root@localai:/build# nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

root@localai:/build# nvidia-smi

Tue Jun 18 10:39:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 6000 Ada Gene...    Off |   00000000:55:00.0 Off |                  Off |
| 30%   36C    P8             28W /  300W |    7024MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
MCP-LTS commented 2 weeks ago

for a reason in auto detection does not call the following from makefile i dont know where and how the autodetection happens

https://github.com/mudler/LocalAI/blob/master/Makefile

backend-assets/grpc/llama-cpp-cuda: backend-assets/grpc
    cp -rf backend/cpp/llama backend/cpp/llama-cuda
    $(MAKE) -C backend/cpp/llama-cuda purge
    $(info ${GREEN}I llama-cpp build info:cuda${RESET})
    CMAKE_ARGS="$(CMAKE_ARGS) -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_AVX512=off -DLLAMA_FMA=off -DLLAMA_F16C=off -DLLAMA_CUDA=ON" $(MAKE) VARIANT="llama-cuda" build-llama-cpp-grpc-server
    cp -rfv backend/cpp/llama-cuda/grpc-server backend-assets/grpc/llama-cpp-cuda

to make it work temporarily i used image: localai/localai:master-cublas-cuda12-ffmpeg and put in variables the following to force compiling llama-cuda

    environment:
#had to set the following for llamma.cpp to make llama-cuda
      - BUILD_GRPC_FOR_BACKEND_LLAMA=true
      - VARIANT=llama-cuda
      - GRPC_BACKENDS=backend-assets/grpc/llama-cpp-cuda

      - REBUILD=true
      - BUILD_TYPE=cublas

note: nvidia-smi both host and inside the container use cuda 12.5

hope helps someone to find a way to autodetect and make the above at the normal rebuild process or in the predefined container

StefanDanielSchwarz commented 2 weeks ago

Thanks, @MCP-LTS, this made CUDA work for me inside the LocalAI container!

Now we just need this to be fixed inside the official images since rebuilding took hours on my (actually pretty beefy) AI server.

dallumnz commented 1 week ago

Thank you @MCP-LTS, this also works on with v2.17.1.

jobongo commented 1 week ago

I found another Github issue for Ollama that seems to be related. https://github.com/ollama/ollama/issues/4563#issuecomment-2132940376.

Seems that the new Nvidia driver doesn't load the necessary kernel module in Linux. I have not tested this out with LocalAI yet on my Linux deployments.

I also run LocalAI on Windows WSL2 with Docker Desktop and was having the same issue. In the same thread as before it mentions updates to Docker Dektop.

I updated Docker Desktop to 4.31.1 (https://docs.docker.com/desktop/release-notes/) and it finally works with the latest drivers (555.99)

So... For anyone out there that is running this in WSL2, try updating Docker Desktop.

rwlove commented 1 week ago

@MCP-LTS I followed your workaround and the build seemed to succeed. However when I try to chat with a model I get the following error:

12:18AM INF [llama-cpp] attempting to load with CUDA variant
12:18AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-cuda
12:18AM DBG GRPC(Mistral-7B-Instruct-v0.3.Q4_K_M.gguf-127.0.0.1:34409): stderr /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-cuda: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

Running LocalAI on a K8S node.

On the node:

➜  ~ ls -l /usr/lib64/libcuda.so.1
lrwxrwxrwx 1 root root 20 May 15 11:53 /usr/lib64/libcuda.so.1 -> libcuda.so.555.42.02

In the container:

root@localai-local-ai-649fd7f4bd-rqtlj:/build# find /usr | grep libcuda
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config-version.cmake
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config.cmake
/usr/local/cuda-12.5/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-header-search.cmake
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudadevrt.a
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart.so.12
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart.so.12.5.39
/usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-12.5/targets/x86_64-linux/lib/stubs/libcuda.so

Any suggestions?

ER-EPR commented 1 week ago

I can't successfully build llama-cuda inside the container. I prefer it to be delivered precompiled within the images. When will this be fixed?

ER-EPR commented 1 week ago

To avoid repeated rebuild I use this dockerfile to build a new image and it works.

FROM localai/localai:v2.17.1-cublas-cuda12-ffmpeg

ENV BUILD_GRPC_FOR_BACKEND_LLAMA=true
ENV VARIANT=llama-cuda
ENV GRPC_BACKENDS=backend-assets/grpc/llama-cpp-cuda
ENV REBUILD=true
ENV BUILD_TYPE=cublas

RUN cd /build && rm -rf ./local-ai && make build -j${BUILD_PARALLELISM:-1}