Closed malcolmlewis closed 1 month ago
If you're using v2.x of the Helm chart, it includes ollama as a dependency. The ollama chart values.yaml nests ollama
under it, so your set keys for things like GPU would be ollama.ollama.gpu.enabled
. The first ollama
is to specify that this is a value to pass to the Ollama chart. The second ollama
comes from https://github.com/otwld/ollama-helm. Check out the values
documentation there. Lmk if this makes sense.
@robrakaric Hi, so I added the additional ollama to the set values;
helm install --namespace=open-webui open-webui open-webui/open-webui \
> --set image.tag=cuda \
> --set ollama.ollama.gpu.enabled=true \
> --set ollama.ollama.gpu.number=1 \
> --set ollama.ollama.gpu.type=nvidia
NAME: open-webui
LAST DEPLOYED: Fri May 10 14:39:34 2024
NAMESPACE: open-webui
STATUS: deployed
REVISION: 1
But still the same error.
helm search repo open-webui
NAME CHART VERSION APP VERSION DESCRIPTION
open-webui/open-webui 2.0.2 latest Open WebUI: A User-Friendly Web Interface for C...
Apologies @malcolmlewis ! I didn't pay attention to --set image.tag=cuda
the first time around. I'm going to try to reproduce the issue, but my container pulls are being very slow for some reason right now.
Can you tell if Ollama is getting access to the GPU? The top of the logs for the Ollama container should tell you. If not, you may need to do --set ollama.runtimeClassName=nvidia
(or whatever your runtimeClassName is for gpu workloads).
Also, is this a single-node system? If so, you may need to enable Nvidia MPS or something in your gpu operator setup in order to share the GPU between Ollama and open-webui (assuming it's supported on your card).
If you only need the GPU for Ollama, you can take out --set image.tag=cuda
. Otherwise, it looks like the open-webui cuda image is relevant for local Whisper and embeddings, whereas the Ollama gpu support is relevant for running your model(s) in-GPU.
@robrakaric No worries :wink:
Removing the image tag and setting the runtimeclass shows the GPU and everything starts as expected;
2024/05/11 03:05:53 routes.go:989: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_NOPRUNE:fal │
│ time=2024-05-11T03:05:53.160Z level=INFO source=images.go:897 msg="total blobs: 0" │
│ time=2024-05-11T03:05:53.160Z level=INFO source=images.go:904 msg="total unused blobs removed: 0" │
│ time=2024-05-11T03:05:53.160Z level=INFO source=routes.go:1034 msg="Listening on :11434 (version 0.1.34)" │
│ time=2024-05-11T03:05:53.161Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2658756554/runners │
│ time=2024-05-11T03:05:56.515Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]" │
│ time=2024-05-11T03:05:56.515Z level=INFO source=gpu.go:122 msg="Detecting GPUs" │
│ time=2024-05-11T03:05:56.543Z level=INFO source=gpu.go:127 msg="detected GPUs" count=1 library=/usr/lib/x86_64-linux-gnu/libcuda.so.520.61.05 │
│ time=2024-05-11T03:05:56.543Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
And
nvidia-smi
Fri May 10 22:24:44 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:01:00.0 Off | 0 |
| N/A 53C P0 23W / 75W | 4480MiB / 7680MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 29398 C ...a_v11/ollama_llama_server 4478MiB |
+-----------------------------------------------------------------------------+
So all good!!! Thank you :smile:
So the option --set ollama.runtimeClassName=nvidia
and not using the cuda tag resolved the issue. Perhaps a note somewhere in the README about this?
@malcolmlewis glad to help!
So you'll have GPU support in Ollama then; awesome!
I dug into it a little further, and open-webui
has its own GPU support. This was failing for you due to using the cuda
image along with this Helm chart.
There is currently no direct support in the Helm chart right now, but as a workaround if you want GPU support in open-webui
, edit your open-webui deployment
manually on your cluster to look sort of like this (paying close attention to runtimeClassName
and . You can request GPU with nvidia.com/gpu: 1
--set resources.limits.nvidia\\.com/gpu=1
in your Helm command. The Ollama supports setting runtimeClassName
, but the openweb-ui one doesn't as of right now.
apiVersion: apps/v1
kind: Deployment
metadata:
name: open-webui
spec:
selector:
matchLabels:
app: open-webui
template:
metadata:
labels:
app: open-webui
spec:
runtimeClassName: nvidia
containers:
- name: open-webui
image: ghcr.io/open-webui/open-webui:cuda
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 8080
Again, if you're running single-node, you might run into GPU contention if you don't have a GPU sharing scheme (such as MPS) enabled.
This is what the open-webui
container (as opposed to the ollama
container) startup looks like when everything is working:
Generating WEBUI_SECRET_KEY │
│ Loading WEBUI_SECRET_KEY from .webui_secret_key │
│ CUDA is enabled, appending LD_LIBRARY_PATH to include torch/cudnn & cublas libraries. │
│ INFO: Started server process [1] │
│ INFO: Waiting for application startup. │
│ │
│ ___ __ __ _ _ _ ___ │
│ / _ \ _ __ ___ _ __ \ \ / /__| |__ | | | |_ _| │
│ | | | | '_ \ / _ \ '_ \ \ \ /\ / / _ \ '_ \| | | || | │
│ | |_| | |_) | __/ | | | \ V V / __/ |_) | |_| || | │
│ \___/| .__/ \___|_| |_| \_/\_/ \___|_.__/ \___/|___| │
│ |_| │
│ │
│ │
│ v0.1.124 - building the best open-source AI user interface. │
│ https://github.com/open-webui/open-webui │
│ │
│ INFO:apps.litellm.main:start_litellm_background │
│ INFO:apps.litellm.main:run_background_process │
│ INFO:apps.litellm.main:Executing command: ['litellm', '--port', '14365', '--host', '127.0.0.1', '--telemetry', 'False', '--config', '/app/backend/data/litellm/config.yaml'] │
│ INFO: Application startup complete. │
│ INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) │
│ INFO:apps.litellm.main:Subprocess started successfully.
Glad you got it working for your use case!
@robrakaric thanks for the tip about MPS, will investigate this further. This is just a single node (Home Lab) only running the Open Webui deployment. Consider this issue closed for now.
Bug Report
Description
With a standard k3s install on bare-metal with a Nvidia Tesla P4 the install fails to detect the GPU.
Bug Summary: Fails to detect the installed Nvidia GPU.
Steps to Reproduce: Install K3s with default values, add the gpu-operator and check. Add the open-webui helm repo, update, create as follows.
Expected Behavior: Detected Nvidia GPU and run...
Actual Behavior:
Environment