How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI) (pvc/init container)

brokedba commented 2 months ago

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI)

I'm looking for guidance on deploying Ollama LLM using Helm charts on a cloud-managed Kubernetes service, specifically Oracle Cloud Infrastructure (OCI). I have a few questions regarding the deployment process:

Persistent Volume and Data Volume Mounting:
- From the values in the Helm chart, how does the ollama-data volume mountPath: "" match the persistentVolume if it's enabled? It's unclear how these values are connected.
- Do we need to create the storage class or PersistentVolumeClaim (PVC) manually for the persistentVolume.values to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.
Loading Models with Init Containers:
- Is there a way to load the models using an init container into the mountPath before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.

The documentation seems limited, making it challenging to proceed. Any examples or additional guidance would be greatly appreciated. Thank you

jdetroyes commented 2 months ago

Hello @brokedba,

Here's an explanation:

From the values in the Helm chart, how does the ollama-data volume mountPath: "" match the persistentVolume if it's enabled? It's unclear how these values are connected.

First, if ollama.mountPath is set, it overrides the default mount path of /root/.ollama. In most cases, this value doesn't need to be changed. When persistentVolume is enabled, there are two scenarios:

If persistentVolume.existingClaim is set:
- The volume will be attached to the container.
If persistentVolume.existingClaim is not set:
- If persistentVolume.storageClass is specified (or left empty), a PVC will be created by the provisioner and attached to the container (See pvc.yaml).

deployment.yaml

volumes:
  - name: ollama-data
    {{- if .Values.persistentVolume.enabled }}
    persistentVolumeClaim:
      claimName: {{ .Values.persistentVolume.existingClaim | default (printf "%s" (include "ollama.fullname" .)) }}
    {{- else }}
    emptyDir: { }
    {{- end }}

Do we need to create the storage class or PersistentVolumeClaim (PVC) manually for the persistentVolume.values to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.

You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim field. To disable automatic provisioning, set persistentVolume.storageClassName: "-".

Example using longhorn as provisioner:

# Enable persistence using Persistent Volume Claims
# ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
persistentVolume:
  # -- Enable persistence using PVC
  enabled: true

  # -- Ollama server data Persistent Volume Storage Class
  # If defined, storageClassName: <storageClass>
  # If set to "-", storageClassName: "", which disables dynamic provisioning
  # If undefined (the default) or set to null, no storageClassName spec is
  # set, choosing the default provisioner.  (gp2 on AWS, standard on
  # GKE, AWS & OpenStack)
  storageClass: "longhorn"

Is there a way to load the models using an init container into the mountPath before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.

To preload models at startup, simply populate the ollama.models array with the list of models you want to pull. If you're using a PVC, models that have already been pulled won't be downloaded again. The chart uses a postStart lifecycle hook to pull models, which are stored in the mountPath.

deployment.yaml

{{- if or .Values.ollama.models .Values.ollama.defaultModel }}
  lifecycle:
    postStart:
      exec:
        command: [ "/bin/sh", "-c", "{{- printf "echo %s | xargs -n1 /bin/ollama pull %s" (include "ollama.modelList" .) (ternary "--insecure" "" .Values.ollama.insecure)}}" ]
{{- end }}

Let me know if you need more details!

brokedba commented 2 months ago

@jdetroyes thank you so much for the answers !!

First, if ollama.mountPath is set, it overrides the default mount path of /root/.ollama. In most cases, this value doesn't need to be changed. When persistentVolume is enabled, there are two scenarios:

If persistentVolume.existingClaim is set: The volume will be attached to the container.

If persistentVolume.existingClaim is not set:

If persistentVolume.storageClass is specified (or left empty), a PVC will be created by the provisioner and attached to the container (See pvc.yaml).

You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim field. To disable automatic provisioning, set persistentVolume.storageClassName: "-".

If I understand well , ollama.mountPath matches either:

The existing claim if specified
Else a new one that is created by the chart through dynamic provisioning by the storageClass

But if StorageClass doesn't exist , the option 2 will not really work am I right?? for now I tried creating local PV and pvc and specified the existing claim but I had below error

manifest:

apiVersion: v1
kind: PersistentVolume
metadata:
name: ollama-pv
spec:
capacity:
storage: 15Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /mnt/data/ollama
--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-pvc
spec:
volumeName: ollama-pv
storageClassName: ""
accessModes:
- ReadWriteOnce
resources:
requests:
  storage: 15Gi

Stream closed EOF for ollama/ollama-685b9f59df-qw98v (ollama)                                                                                                   
Stream closed EOF for ollama/ollama-685b9f59df-qw98v (install-and-setup-model)

EDIT : it only worked after I hardcoded the storageClassName: "oci-bv" (Default in Oracle Clooud)

To preload models at startup, simply populate the ollama.models array with the list of models you want to pull.

Here's the thing , my K8 is CPU only . So I need GGUF models to be loaded not GPU. Hence my initcontainers section (See below gist)

Here is the chart yaml values I used ollama_values.yml I also wonder if keepalive value could also be set in the helm

the purpose of the init container

install huggingface cli ,
donwload a gguf model into the "mountPath" using hf cli

huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
  --local-dir ai_models \
  --local-dir-use-symlinks False

edit a modelfile
load the model : run ollama create llama3 -f llama3.loc

jdetroyes commented 2 months ago

Hello @CloudDude

Based on your scenario, here an example with initContainers and custom lifecycle to download and create model from huggingface.


initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    command: [sh, -c]
    args:
      - |
        pip install -U "huggingface_hub[cli]";
        mkdir -p /root/.ollama/download;
        huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
          --local-dir /root/.ollama/download \
          --local-dir-use-symlinks False;
        echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf' > /root/.ollama/download/llama3.loc;
    volumeMounts:
      - name: ollama-data # Use the same name defined is volumes sections in deployment.yaml
        mountPath: /root/.ollama  # Use same as default

# -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
  postStart:
    exec:
      command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]

persistentVolume:
  # Enable PVC for Ollama
  enabled: true

  # Use default storage class
  storageClass: ""

brokedba commented 2 months ago

I hit Docker Hub rate limits so I needed to add dockerhub secret but it's complaining .

W0829 17:43:50.092448 9912 warnings.go:70] unknown field "spec.template.spec.initContainers[0].imagePullSecrets" Release "ollama" has been upgraded. Happy Helming!

am I missing something ? is it included in the template ?

initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    imagePullSecrets:
      - name: dockerhub-sec
    command: [sh, -c]
    args:

jdetroyes commented 2 months ago

Hey @brokedba

Docker secrets are shared with all containers in the deployment.

You don't have to add a line in the initContainers, you just have to populate in the values.yaml

# -- Docker registry secret names as an array

imagePullSecrets: []

brokedba commented 2 months ago

my bad I had corrected before I could update the post. the container is still in pending mode now after that change

~Events:~
~│ Warning FailedScheduling 9m32s default-scheduler 0/3 nodes are available: persistentvolumeclaim "ollama" is being deleted. preemption: 0/3 nodes are available: 3 Preemption │ Preemption is not helpful for scheduling~

Edit : the initcontainer phase worked but partially . The tasks done :

install hg cli
download the model
edit the modelfile

But the ollama command didn't/couldn't work . (trying the check var/log/syslog in this container but it doesn't seem to have the usual log files.

What do you think could have caused the postStart not to work (ollama create) ?

# -- Init containers to add to the pod
initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    command: [sh, -c]
    args:
      - |
        pip install -U "huggingface_hub[cli]";
        mkdir -p /root/.ollama/download;
        huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
          --local-dir /root/.ollama/download \
          --local-dir-use-symlinks False;
        echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf
        # Set custom parameter values
        PARAMETER temperature 1.0
        PARAMETER stop "<|start_header_id|>"
        PARAMETER stop "<|end_header_id|>"
        PARAMETER stop "<|eot_id|>"

        # Define the model template
         TEMPLATE """
        {{ if .System }}<|start_header_id|>system<|end_header_id|>
        {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
        {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
        {{ .Response }}<|eot_id|>
        """
        # Set the system message
        SYSTEM You are a helpful AI assistant named e-llmo Assistant.' > /root/.ollama/download/llama3.loc;
    volumeMounts:
      - name: ollama-data
        mountPath: /root/.ollama  # Use same as default
  # -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
  postStart:
    exec:
      command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]

brokedba commented 2 months ago

I also noted that the result of the modelfile is truncated where any line between curly brackets {{}}} was ignored . Although the echo or printf commands works manually after the pod is ready.

the below is the final version after I logged in to the container. I think it might be behind the create command working who knows. Any idea how to escape curly braces in yaml ?

FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf

# Set custom parameter values
PARAMETER temperature 1.0
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

# Define the model template
TEMPLATE """<|start_header_id|>assistant<|end_header_id|>       <------lines below were all ignored
<|eot_id|>
"""

# Set the system message
SYSTEM You are a helpful AI assistant named e-llmo Assistant.

I found online that {{ "{{" }} ... {{ "}}" }} could be the fix. will try

otwld / ollama-helm

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI) (pvc/init container) #90

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI)