otwld / ollama-helm

Helm chart for Ollama on Kubernetes
https://artifacthub.io/packages/helm/ollama-helm/ollama
MIT License
250 stars 44 forks source link

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI) (pvc/init container) #90

Closed brokedba closed 2 months ago

brokedba commented 2 months ago

How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI)

I'm looking for guidance on deploying Ollama LLM using Helm charts on a cloud-managed Kubernetes service, specifically Oracle Cloud Infrastructure (OCI). I have a few questions regarding the deployment process:

  1. Persistent Volume and Data Volume Mounting:

    • From the values in the Helm chart, how does the ollama-data volume mountPath: "" match the persistentVolume if it's enabled? It's unclear how these values are connected.
    • Do we need to create the storage class or PersistentVolumeClaim (PVC) manually for the persistentVolume.values to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.
  2. Loading Models with Init Containers:

    • Is there a way to load the models using an init container into the mountPath before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.

The documentation seems limited, making it challenging to proceed. Any examples or additional guidance would be greatly appreciated. Thank you

jdetroyes commented 2 months ago

Hello @brokedba,

Here's an explanation:

From the values in the Helm chart, how does the ollama-data volume mountPath: "" match the persistentVolume if it's enabled? It's unclear how these values are connected.

First, if ollama.mountPath is set, it overrides the default mount path of /root/.ollama. In most cases, this value doesn't need to be changed. When persistentVolume is enabled, there are two scenarios:

deployment.yaml

volumes:
  - name: ollama-data
    {{- if .Values.persistentVolume.enabled }}
    persistentVolumeClaim:
      claimName: {{ .Values.persistentVolume.existingClaim | default (printf "%s" (include "ollama.fullname" .)) }}
    {{- else }}
    emptyDir: { }
    {{- end }}

Do we need to create the storage class or PersistentVolumeClaim (PVC) manually for the persistentVolume.values to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.

You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim field. To disable automatic provisioning, set persistentVolume.storageClassName: "-".

Example using longhorn as provisioner:

# Enable persistence using Persistent Volume Claims
# ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
persistentVolume:
  # -- Enable persistence using PVC
  enabled: true

  # -- Ollama server data Persistent Volume Storage Class
  # If defined, storageClassName: <storageClass>
  # If set to "-", storageClassName: "", which disables dynamic provisioning
  # If undefined (the default) or set to null, no storageClassName spec is
  # set, choosing the default provisioner.  (gp2 on AWS, standard on
  # GKE, AWS & OpenStack)
  storageClass: "longhorn"

Is there a way to load the models using an init container into the mountPath before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.

To preload models at startup, simply populate the ollama.models array with the list of models you want to pull. If you're using a PVC, models that have already been pulled won't be downloaded again. The chart uses a postStart lifecycle hook to pull models, which are stored in the mountPath.

deployment.yaml

{{- if or .Values.ollama.models .Values.ollama.defaultModel }}
  lifecycle:
    postStart:
      exec:
        command: [ "/bin/sh", "-c", "{{- printf "echo %s | xargs -n1 /bin/ollama pull %s" (include "ollama.modelList" .) (ternary "--insecure" "" .Values.ollama.insecure)}}" ]
{{- end }}

Let me know if you need more details!


brokedba commented 2 months ago

@jdetroyes thank you so much for the answers !!

First, if ollama.mountPath is set, it overrides the default mount path of /root/.ollama. In most cases, this value doesn't need to be changed. When persistentVolume is enabled, there are two scenarios:

  • If persistentVolume.existingClaim is set: The volume will be attached to the container.
  • If persistentVolume.existingClaim is not set:
  • If persistentVolume.storageClass is specified (or left empty), a PVC will be created by the provisioner and attached to the container (See pvc.yaml).

You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim field. To disable automatic provisioning, set persistentVolume.storageClassName: "-".

If I understand well , ollama.mountPath matches either:

  1. The existing claim if specified
  2. Else a new one that is created by the chart through dynamic provisioning by the storageClass

    But if StorageClass doesn't exist , the option 2 will not really work am I right?? for now I tried creating local PV and pvc and specified the existing claim but I had below error

EDIT : it only worked after I hardcoded the storageClassName: "oci-bv" (Default in Oracle Clooud)

To preload models at startup, simply populate the ollama.models array with the list of models you want to pull.

Here's the thing , my K8 is CPU only . So I need GGUF models to be loaded not GPU. Hence my initcontainers section (See below gist)

the purpose of the init container

  1. install huggingface cli ,
  2. donwload a gguf model into the "mountPath" using hf cli
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
  --local-dir ai_models \
  --local-dir-use-symlinks False
  1. edit a modelfile
  2. load the model : run ollama create llama3 -f llama3.loc
jdetroyes commented 2 months ago

Hello @CloudDude

Based on your scenario, here an example with initContainers and custom lifecycle to download and create model from huggingface.


initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    command: [sh, -c]
    args:
      - |
        pip install -U "huggingface_hub[cli]";
        mkdir -p /root/.ollama/download;
        huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
          --local-dir /root/.ollama/download \
          --local-dir-use-symlinks False;
        echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf' > /root/.ollama/download/llama3.loc;
    volumeMounts:
      - name: ollama-data # Use the same name defined is volumes sections in deployment.yaml
        mountPath: /root/.ollama  # Use same as default

# -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
  postStart:
    exec:
      command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]

persistentVolume:
  # Enable PVC for Ollama
  enabled: true

  # Use default storage class
  storageClass: ""
brokedba commented 2 months ago

I hit Docker Hub rate limits so I needed to add dockerhub secret but it's complaining .

W0829 17:43:50.092448 9912 warnings.go:70] unknown field "spec.template.spec.initContainers[0].imagePullSecrets" Release "ollama" has been upgraded. Happy Helming!

am I missing something ? is it included in the template ?

initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    imagePullSecrets:
      - name: dockerhub-sec
    command: [sh, -c]
    args:
jdetroyes commented 2 months ago

Hey @brokedba

Docker secrets are shared with all containers in the deployment.

You don't have to add a line in the initContainers, you just have to populate in the values.yaml

# -- Docker registry secret names as an array

imagePullSecrets: []
brokedba commented 2 months ago

my bad I had corrected before I could update the post. the container is still in pending mode now after that change

~Events:~
~│ Warning FailedScheduling 9m32s default-scheduler 0/3 nodes are available: persistentvolumeclaim "ollama" is being deleted. preemption: 0/3 nodes are available: 3 Preemption │ Preemption is not helpful for scheduling~

Edit : the initcontainer phase worked but partially . The tasks done :

  1. install hg cli
  2. download the model
  3. edit the modelfile

What do you think could have caused the postStart not to work (ollama create) ?

# -- Init containers to add to the pod
initContainers:
  - name: install-and-setup-model
    image: python:3.9  # Use an image with Python and pip pre-installed
    command: [sh, -c]
    args:
      - |
        pip install -U "huggingface_hub[cli]";
        mkdir -p /root/.ollama/download;
        huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
          --local-dir /root/.ollama/download \
          --local-dir-use-symlinks False;
        echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf
        # Set custom parameter values
        PARAMETER temperature 1.0
        PARAMETER stop "<|start_header_id|>"
        PARAMETER stop "<|end_header_id|>"
        PARAMETER stop "<|eot_id|>"

        # Define the model template
         TEMPLATE """
        {{ if .System }}<|start_header_id|>system<|end_header_id|>
        {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
        {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
        {{ .Response }}<|eot_id|>
        """
        # Set the system message
        SYSTEM You are a helpful AI assistant named e-llmo Assistant.' > /root/.ollama/download/llama3.loc;
    volumeMounts:
      - name: ollama-data
        mountPath: /root/.ollama  # Use same as default
  # -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
  postStart:
    exec:
      command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]
brokedba commented 2 months ago

I also noted that the result of the modelfile is truncated where any line between curly brackets {{}}} was ignored . Although the echo or printf commands works manually after the pod is ready.

the below is the final version after I logged in to the container. I think it might be behind the create command working who knows. Any idea how to escape curly braces in yaml ?

FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf

# Set custom parameter values
PARAMETER temperature 1.0
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"

# Define the model template
TEMPLATE """<|start_header_id|>assistant<|end_header_id|>       <------lines below were all ignored
<|eot_id|>
"""

# Set the system message
SYSTEM You are a helpful AI assistant named e-llmo Assistant. 

I found online that {{ "{{" }} ... {{ "}}" }} could be the fix. will try