open-webui / helm-charts

33 stars 19 forks source link

Helm install of cuda image not detecting the install Nvidia GPU #17

Closed malcolmlewis closed 1 month ago

malcolmlewis commented 1 month ago

Bug Report

Description

With a standard k3s install on bare-metal with a Nvidia Tesla P4 the install fails to detect the GPU.

Bug Summary: Fails to detect the installed Nvidia GPU.

Steps to Reproduce: Install K3s with default values, add the gpu-operator and check. Add the open-webui helm repo, update, create as follows.

Expected Behavior: Detected Nvidia GPU and run...

Actual Behavior:

gpu-operator/nvidia-operator-validator-czqq6:toolkit-validation
+-----------------------------------------------------------------------------+                                                                                                            │
│ | NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |                                                                                                            │
│ |-------------------------------+----------------------+----------------------+                                                                                                            │
│ | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |                                                                                                            │
│ | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |                                                                                                            │
│ |                               |                      |               MIG M. |                                                                                                            │
│ |===============================+======================+======================|                                                                                                            │
│ |   0  Tesla P4            Off  | 00000000:01:00.0 Off |                    0 |                                                                                                            │
│ | N/A   28C    P0    21W /  75W |      0MiB /  7680MiB |      0%      Default |                                                                                                            │
│ |                               |                      |                  N/A |                                                                                                            │
│ +-------------------------------+----------------------+----------------------+                                                                                                            │
│                                                                                                                                                                                            │
│ +-----------------------------------------------------------------------------+                                                                                                            │
│ | Processes:                                                                  |                                                                                                            │
│ |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |                                                                                                            │
│ |        ID   ID                                                   Usage      |                                                                                                            │
│ |=============================================================================|                                                                                                            │
│ |  No running processes found                                                 |                                                                                                            │
│ +-----------------------------------------------------------------------------+ 
helm install --namespace=open-webui open-webui open-webui/open-webui \
>    --set image.tag=cuda \
>    --set ollama.gpu.enabled=true \
>    --set ollama.gpu.number=1 \
>    --set ollama.gpu.type=nvidia
NAME: open-webui
LAST DEPLOYED: Fri May 10 09:32:01 2024
NAMESPACE: open-webui
STATUS: deployed
REVISION: 1
kubectl logs -n open-webui open-webui-6645f5d9fb-m8lzq

Loading WEBUI_SECRET_KEY from file, not provided as an environment variable.
Generating WEBUI_SECRET_KEY
Loading WEBUI_SECRET_KEY from .webui_secret_key
CUDA is enabled, appending LD_LIBRARY_PATH to include torch/cudnn & cublas libraries.
Traceback (most recent call last):
  File "/usr/local/bin/uvicorn", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/main.py", line 410, in main
    run(
  File "/usr/local/lib/python3.11/site-packages/uvicorn/main.py", line 578, in run
    server.run()
  File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 61, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.11/site-packages/uvicorn/server.py", line 68, in serve
    config.load()
  File "/usr/local/lib/python3.11/site-packages/uvicorn/config.py", line 473, in load
    self.loaded_app = import_from_string(self.app)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/backend/main.py", line 32, in <module>
    from apps.rag.main import app as rag_app
  File "/app/backend/apps/rag/main.py", line 160, in <module>
    update_embedding_model(
  File "/app/backend/apps/rag/main.py", line 137, in update_embedding_model
    app.state.sentence_transformer_ef = sentence_transformers.SentenceTransformer(
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 221, in __init__
    self.to(device)
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 779, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(
           ^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Environment

robrakaric commented 1 month ago

If you're using v2.x of the Helm chart, it includes ollama as a dependency. The ollama chart values.yaml nests ollama under it, so your set keys for things like GPU would be ollama.ollama.gpu.enabled. The first ollama is to specify that this is a value to pass to the Ollama chart. The second ollama comes from https://github.com/otwld/ollama-helm. Check out the values documentation there. Lmk if this makes sense.

malcolmlewis commented 1 month ago

@robrakaric Hi, so I added the additional ollama to the set values;

helm install --namespace=open-webui open-webui open-webui/open-webui \
>    --set image.tag=cuda \
>    --set ollama.ollama.gpu.enabled=true \
>    --set ollama.ollama.gpu.number=1 \
>    --set ollama.ollama.gpu.type=nvidia
NAME: open-webui
LAST DEPLOYED: Fri May 10 14:39:34 2024
NAMESPACE: open-webui
STATUS: deployed
REVISION: 1

But still the same error.

helm search repo open-webui

NAME                    CHART VERSION   APP VERSION DESCRIPTION                                       
open-webui/open-webui   2.0.2           latest      Open WebUI: A User-Friendly Web Interface for C...
robrakaric commented 1 month ago

Apologies @malcolmlewis ! I didn't pay attention to --set image.tag=cuda the first time around. I'm going to try to reproduce the issue, but my container pulls are being very slow for some reason right now.

Can you tell if Ollama is getting access to the GPU? The top of the logs for the Ollama container should tell you. If not, you may need to do --set ollama.runtimeClassName=nvidia (or whatever your runtimeClassName is for gpu workloads).

Also, is this a single-node system? If so, you may need to enable Nvidia MPS or something in your gpu operator setup in order to share the GPU between Ollama and open-webui (assuming it's supported on your card).

If you only need the GPU for Ollama, you can take out --set image.tag=cuda. Otherwise, it looks like the open-webui cuda image is relevant for local Whisper and embeddings, whereas the Ollama gpu support is relevant for running your model(s) in-GPU.

malcolmlewis commented 1 month ago

@robrakaric No worries :wink:

Removing the image tag and setting the runtimeclass shows the GPU and everything starts as expected;

2024/05/11 03:05:53 routes.go:989: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:fal │
│ time=2024-05-11T03:05:53.160Z level=INFO source=images.go:897 msg="total blobs: 0"                                                                                                         │
│ time=2024-05-11T03:05:53.160Z level=INFO source=images.go:904 msg="total unused blobs removed: 0"                                                                                          │
│ time=2024-05-11T03:05:53.160Z level=INFO source=routes.go:1034 msg="Listening on :11434 (version 0.1.34)"                                                                                  │
│ time=2024-05-11T03:05:53.161Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2658756554/runners                                                            │
│ time=2024-05-11T03:05:56.515Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]"                                                      │
│ time=2024-05-11T03:05:56.515Z level=INFO source=gpu.go:122 msg="Detecting GPUs"                                                                                                            │
│ time=2024-05-11T03:05:56.543Z level=INFO source=gpu.go:127 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.520.61.05                                              │
│ time=2024-05-11T03:05:56.543Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"

And

 nvidia-smi 
Fri May 10 22:24:44 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:01:00.0 Off |                    0 |
| N/A   53C    P0    23W /  75W |   4480MiB /  7680MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     29398      C   ...a_v11/ollama_llama_server     4478MiB |
+-----------------------------------------------------------------------------+

So all good!!! Thank you :smile:

So the option --set ollama.runtimeClassName=nvidia and not using the cuda tag resolved the issue. Perhaps a note somewhere in the README about this?

robrakaric commented 1 month ago

@malcolmlewis glad to help!

So you'll have GPU support in Ollama then; awesome!

I dug into it a little further, and open-webui has its own GPU support. This was failing for you due to using the cuda image along with this Helm chart.

There is currently no direct support in the Helm chart right now, but as a workaround if you want GPU support in open-webui, edit your open-webui deployment manually on your cluster to look sort of like this (paying close attention to runtimeClassName and nvidia.com/gpu: 1. You can request GPU with --set resources.limits.nvidia\\.com/gpu=1 in your Helm command. The Ollama supports setting runtimeClassName, but the openweb-ui one doesn't as of right now.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui
spec:
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      runtimeClassName: nvidia
      containers:
      - name: open-webui
        image: ghcr.io/open-webui/open-webui:cuda
        resources:
          limits:
            nvidia.com/gpu: 1
        ports:
        - containerPort: 8080

Again, if you're running single-node, you might run into GPU contention if you don't have a GPU sharing scheme (such as MPS) enabled.

This is what the open-webui container (as opposed to the ollama container) startup looks like when everything is working:

 Generating WEBUI_SECRET_KEY                                                                                                                                                                                                                   │
│ Loading WEBUI_SECRET_KEY from .webui_secret_key                                                                                                                                                                                               │
│ CUDA is enabled, appending LD_LIBRARY_PATH to include torch/cudnn & cublas libraries.                                                                                                                                                         │
│ INFO:     Started server process [1]                                                                                                                                                                                                          │
│ INFO:     Waiting for application startup.                                                                                                                                                                                                    │
│                                                                                                                                                                                                                                               │
│   ___                    __        __   _     _   _ ___                                                                                                                                                                                       │
│  / _ \ _ __   ___ _ __   \ \      / /__| |__ | | | |_ _|                                                                                                                                                                                      │
│ | | | | '_ \ / _ \ '_ \   \ \ /\ / / _ \ '_ \| | | || |                                                                                                                                                                                       │
│ | |_| | |_) |  __/ | | |   \ V  V /  __/ |_) | |_| || |                                                                                                                                                                                       │
│  \___/| .__/ \___|_| |_|    \_/\_/ \___|_.__/ \___/|___|                                                                                                                                                                                      │
│       |_|                                                                                                                                                                                                                                     │
│                                                                                                                                                                                                                                               │
│                                                                                                                                                                                                                                               │
│ v0.1.124 - building the best open-source AI user interface.                                                                                                                                                                                   │
│ https://github.com/open-webui/open-webui                                                                                                                                                                                                      │
│                                                                                                                                                                                                                                               │
│ INFO:apps.litellm.main:start_litellm_background                                                                                                                                                                                               │
│ INFO:apps.litellm.main:run_background_process                                                                                                                                                                                                 │
│ INFO:apps.litellm.main:Executing command: ['litellm', '--port', '14365', '--host', '127.0.0.1', '--telemetry', 'False', '--config', '/app/backend/data/litellm/config.yaml']                                                                  │
│ INFO:     Application startup complete.                                                                                                                                                                                                       │
│ INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)                                                                                                                                                                       │
│ INFO:apps.litellm.main:Subprocess started successfully.

Glad you got it working for your use case!

malcolmlewis commented 1 month ago

@robrakaric thanks for the tip about MPS, will investigate this further. This is just a single node (Home Lab) only running the Open Webui deployment. Consider this issue closed for now.