Closed brokedba closed 2 months ago
Hello @brokedba,
Here's an explanation:
From the values in the Helm chart, how does the
ollama-data
volumemountPath: ""
match thepersistentVolume
if it's enabled? It's unclear how these values are connected.
First, if ollama.mountPath
is set, it overrides the default mount path of /root/.ollama
. In most cases, this value doesn't need to be changed. When persistentVolume
is enabled, there are two scenarios:
persistentVolume.existingClaim
is set:
persistentVolume.existingClaim
is not set:
persistentVolume.storageClass
is specified (or left empty), a PVC will be created by the provisioner and attached to the container (See pvc.yaml).volumes:
- name: ollama-data
{{- if .Values.persistentVolume.enabled }}
persistentVolumeClaim:
claimName: {{ .Values.persistentVolume.existingClaim | default (printf "%s" (include "ollama.fullname" .)) }}
{{- else }}
emptyDir: { }
{{- end }}
Do we need to create the storage class or PersistentVolumeClaim (PVC) manually for the
persistentVolume.values
to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.
You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim
field. To disable automatic provisioning, set persistentVolume.storageClassName: "-"
.
Example using longhorn as provisioner:
# Enable persistence using Persistent Volume Claims
# ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
persistentVolume:
# -- Enable persistence using PVC
enabled: true
# -- Ollama server data Persistent Volume Storage Class
# If defined, storageClassName: <storageClass>
# If set to "-", storageClassName: "", which disables dynamic provisioning
# If undefined (the default) or set to null, no storageClassName spec is
# set, choosing the default provisioner. (gp2 on AWS, standard on
# GKE, AWS & OpenStack)
storageClass: "longhorn"
Is there a way to load the models using an init container into the
mountPath
before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.
To preload models at startup, simply populate the ollama.models
array with the list of models you want to pull. If you're using a PVC, models that have already been pulled won't be downloaded again. The chart uses a postStart
lifecycle hook to pull models, which are stored in the mountPath
.
{{- if or .Values.ollama.models .Values.ollama.defaultModel }}
lifecycle:
postStart:
exec:
command: [ "/bin/sh", "-c", "{{- printf "echo %s | xargs -n1 /bin/ollama pull %s" (include "ollama.modelList" .) (ternary "--insecure" "" .Values.ollama.insecure)}}" ]
{{- end }}
Let me know if you need more details!
@jdetroyes thank you so much for the answers !!
First, if ollama.mountPath is set, it overrides the default mount path of /root/.ollama. In most cases, this value doesn't need to be changed. When persistentVolume is enabled, there are two scenarios:
- If persistentVolume.existingClaim is set: The volume will be attached to the container.
- If persistentVolume.existingClaim is not set:
- If persistentVolume.storageClass is specified (or left empty), a PVC will be created by the provisioner and attached to the container (See pvc.yaml).
You can specify a StorageClass that is already configured in your infrastructure to automatically create a PVC. Alternatively, if you already have a PVC configured, you can set the persistentVolume.existingClaim field. To disable automatic provisioning, set persistentVolume.storageClassName: "-".
If I understand well , ollama.mountPath
matches either:
Else a new one that is created by the chart through dynamic provisioning by the storageClass
But if StorageClass doesn't exist , the option 2 will not really work am I right?? for now I tried creating local PV and pvc and specified the existing claim but I had below error
apiVersion: v1
kind: PersistentVolume
metadata:
name: ollama-pv
spec:
capacity:
storage: 15Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /mnt/data/ollama
--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-pvc
spec:
volumeName: ollama-pv
storageClassName: ""
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 15Gi
Stream closed EOF for ollama/ollama-685b9f59df-qw98v (ollama)
Stream closed EOF for ollama/ollama-685b9f59df-qw98v (install-and-setup-model)
EDIT : it only worked after I hardcoded the storageClassName: "oci-bv" (Default in Oracle Clooud)
To preload models at startup, simply populate the ollama.models array with the list of models you want to pull.
Here's the thing , my K8 is CPU only . So I need GGUF models to be loaded not GPU. Hence my initcontainers
section (See below gist)
the purpose of the init container
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
--local-dir ai_models \
--local-dir-use-symlinks False
ollama create llama3 -f llama3.loc
Hello @CloudDude
Based on your scenario, here an example with initContainers
and custom lifecycle
to download and create model from huggingface.
initContainers:
- name: install-and-setup-model
image: python:3.9 # Use an image with Python and pip pre-installed
command: [sh, -c]
args:
- |
pip install -U "huggingface_hub[cli]";
mkdir -p /root/.ollama/download;
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
--local-dir /root/.ollama/download \
--local-dir-use-symlinks False;
echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf' > /root/.ollama/download/llama3.loc;
volumeMounts:
- name: ollama-data # Use the same name defined is volumes sections in deployment.yaml
mountPath: /root/.ollama # Use same as default
# -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
postStart:
exec:
command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]
persistentVolume:
# Enable PVC for Ollama
enabled: true
# Use default storage class
storageClass: ""
I hit Docker Hub rate limits so I needed to add dockerhub secret but it's complaining .
W0829 17:43:50.092448 9912 warnings.go:70] unknown field "spec.template.spec.initContainers[0].imagePullSecrets" Release "ollama" has been upgraded. Happy Helming!
am I missing something ? is it included in the template ?
initContainers:
- name: install-and-setup-model
image: python:3.9 # Use an image with Python and pip pre-installed
imagePullSecrets:
- name: dockerhub-sec
command: [sh, -c]
args:
Hey @brokedba
Docker secrets are shared with all containers in the deployment.
You don't have to add a line in the initContainers, you just have to populate in the values.yaml
# -- Docker registry secret names as an array
imagePullSecrets: []
my bad I had corrected before I could update the post. the container is still in pending mode now after that change
~Events:~
~│ Warning FailedScheduling 9m32s default-scheduler 0/3 nodes are available: persistentvolumeclaim "ollama" is being deleted. preemption: 0/3 nodes are available: 3 Preemption │
Preemption is not helpful for scheduling~
Edit : the initcontainer phase worked but partially . The tasks done :
What do you think could have caused the postStart
not to work (ollama create) ?
# -- Init containers to add to the pod
initContainers:
- name: install-and-setup-model
image: python:3.9 # Use an image with Python and pip pre-installed
command: [sh, -c]
args:
- |
pip install -U "huggingface_hub[cli]";
mkdir -p /root/.ollama/download;
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf \
--local-dir /root/.ollama/download \
--local-dir-use-symlinks False;
echo 'FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf
# Set custom parameter values
PARAMETER temperature 1.0
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
# Define the model template
TEMPLATE """
{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>
"""
# Set the system message
SYSTEM You are a helpful AI assistant named e-llmo Assistant.' > /root/.ollama/download/llama3.loc;
volumeMounts:
- name: ollama-data
mountPath: /root/.ollama # Use same as default
# -- Lifecycle for pod assignment (override ollama.models startup pulling)
lifecycle:
postStart:
exec:
command: [ "/bin/sh", "-c", "ollama create llama3 -f /root/.ollama/download/llama3.loc" ]
I also noted that the result of the modelfile is truncated where any line between curly brackets {{}}} was ignored . Although the echo or printf commands works manually after the pod is ready.
the below is the final version after I logged in to the container. I think it might be behind the create command working who knows. Any idea how to escape curly braces in yaml ?
FROM /root/.ollama/download/Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf
# Set custom parameter values
PARAMETER temperature 1.0
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
# Define the model template
TEMPLATE """<|start_header_id|>assistant<|end_header_id|> <------lines below were all ignored
<|eot_id|>
"""
# Set the system message
SYSTEM You are a helpful AI assistant named e-llmo Assistant.
I found online that {{ "{{" }} ... {{ "}}" }}
could be the fix. will try
How to Deploy Ollama LLM on Cloud-Managed Kubernetes (OCI)
I'm looking for guidance on deploying Ollama LLM using Helm charts on a cloud-managed Kubernetes service, specifically Oracle Cloud Infrastructure (OCI). I have a few questions regarding the deployment process:
Persistent Volume and Data Volume Mounting:
ollama-data
volumemountPath: ""
match thepersistentVolume
if it's enabled? It's unclear how these values are connected.persistentVolume.values
to be effective? There isn't much clarity on this in the documentation, and it would be helpful to have an example.Loading Models with Init Containers:
mountPath
before the main pod is spun up? This feature would be useful for preloading models and ensuring they're ready when the main container starts.The documentation seems limited, making it challenging to proceed. Any examples or additional guidance would be greatly appreciated. Thank you