otwld / ollama-helm

Helm chart for Ollama on Kubernetes
https://artifacthub.io/packages/helm/ollama-helm/ollama
MIT License
250 stars 44 forks source link

PostHookStartError when downloading a model #68

Closed tringler closed 3 months ago

tringler commented 3 months ago

Hello,

I'm unable to download any model by helm chart. It is crashing on post-hook step when models are being downloaded. Pod logs:

2024/07/14 17:18:34 routes.go:940: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-14T17:18:34.632Z level=INFO source=images.go:760 msg="total blobs: 0"
time=2024-07-14T17:18:34.632Z level=INFO source=images.go:767 msg="total unused blobs removed: 0"
time=2024-07-14T17:18:34.633Z level=INFO source=routes.go:987 msg="Listening on [::]:11434 (version 0.2.2)"
time=2024-07-14T17:18:34.634Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama114622503/runners
time=2024-07-14T17:18:39.137Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries []"
time=2024-07-14T17:18:39.138Z level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-07-14T17:18:39.141Z level=INFO source=gpu.go:346 msg="no compatible GPUs were discovered"
time=2024-07-14T17:18:39.141Z level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.3 GiB" available="38.8 GiB"

On the shell of the main pod I cannot use ollama cli, but I think that is expected. ollama ps just gives me: Error: something went wrong, please see the ollama server logs for details

Any idea how to proceed? Thanks in Advance!

These are my values:

# Default values for ollama-helm.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# -- Number of replicas
replicaCount: 1

# Knative configuration
knative:
  # -- Enable Knative integration
  enabled: false
  # -- Knative service container concurrency
  containerConcurrency: 0
  # -- Knative service timeout seconds
  timeoutSeconds: 300
  # -- Knative service response start timeout seconds
  responseStartTimeoutSeconds: 300
  # -- Knative service idle timeout seconds
  idleTimeoutSeconds: 300

# Docker image
image:
  # -- Docker image registry
  repository: ollama/ollama

  # -- Docker pull policy
  pullPolicy: IfNotPresent

  # -- Docker image tag, overrides the image tag whose default is the chart appVersion.
  tag: ""

# -- Docker registry secret names as an array
imagePullSecrets: []

# -- String to partially override template  (will maintain the release name)
nameOverride: ""

# -- String to fully override template
fullnameOverride: ""

# Ollama parameters
ollama:
  gpu:
    # -- Enable GPU integration
    enabled: false

    # -- GPU type: 'nvidia' or 'amd'
    # If 'ollama.gpu.enabled', default value is nvidia
    # If set to 'amd', this will add 'rocm' suffix to image tag if 'image.tag' is not override
    # This is due cause AMD and CPU/CUDA are different images
    type: 'nvidia'

    # -- Specify the number of GPU
    number: 1

  # -- List of models to pull at container startup
  # The more you add, the longer the container will take to start if models are not present
  # models:
  #  - llama2
  #  - mistral
  models:
  - llama2

  # -- Add insecure flag for pulling at container startup
  insecure: false

# Service account
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount:
  # -- Specifies whether a service account should be created
  create: true

  # -- Automatically mount a ServiceAccount's API credentials?
  automount: true

  # -- Annotations to add to the service account
  annotations: {}

  # -- The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: ""

# -- Map of annotations to add to the pods
podAnnotations: {}

# -- Map of labels to add to the pods
podLabels: {}

# -- Pod Security Context
podSecurityContext: {}
# fsGroup: 2000

# -- Container Security Context
securityContext: {}
# capabilities:
#  drop:
#   - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000

# -- Specify runtime class
runtimeClassName: ""

# Configure Service
service:

  # -- Service type
  type: ClusterIP

  # -- Service port
  port: 11434

  # -- Service node port when service type is 'NodePort'
  nodePort: 31434

  # -- Annotations to add to the service
  annotations: {}

# Configure the ingress resource that allows you to access the
ingress:
  # -- Enable ingress controller resource
  enabled: false

  # -- IngressClass that will be used to implement the Ingress (Kubernetes 1.18+)
  className: ""

  # -- Additional annotations for the Ingress resource.
  annotations: {}
  # kubernetes.io/ingress.class: traefik
  # kubernetes.io/ingress.class: nginx
  # kubernetes.io/tls-acme: "true"

  # The list of hostnames to be covered with this ingress record.
  hosts:
  - host: ollama.local
    paths:
    - path: /
      pathType: Prefix

  # --  The tls configuration for hostnames to be covered with this ingress record.
  tls: []
  #  - secretName: chart-example-tls
#    hosts:
#      - chart-example.local

# Configure resource requests and limits
# ref: http://kubernetes.io/docs/user-guide/compute-resources/
resources:
  # -- Pod requests
  requests: {}
  # Memory request
  # memory: 4096Mi

  # CPU request
  # cpu: 2000m

  # -- Pod limit
  limits: {}
  # Memory limit
# memory: 8192Mi

# CPU limit
# cpu: 4000m

# Configure extra options for liveness probe
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes
livenessProbe:
  # -- Enable livenessProbe
  enabled: true

  # -- Request path for livenessProbe
  path: /

  # -- Initial delay seconds for livenessProbe
  initialDelaySeconds: 60

  # -- Period seconds for livenessProbe
  periodSeconds: 10

  # -- Timeout seconds for livenessProbe
  timeoutSeconds: 5

  # -- Failure threshold for livenessProbe
  failureThreshold: 6

  # -- Success threshold for livenessProbe
  successThreshold: 1

# Configure extra options for readiness probe
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes
readinessProbe:
  # -- Enable readinessProbe
  enabled: true

  # -- Request path for readinessProbe
  path: /

  # -- Initial delay seconds for readinessProbe
  initialDelaySeconds: 30

  # -- Period seconds for readinessProbe
  periodSeconds: 5

  # -- Timeout seconds for readinessProbe
  timeoutSeconds: 3

  # -- Failure threshold for readinessProbe
  failureThreshold: 6

  # -- Success threshold for readinessProbe
  successThreshold: 1

# Configure autoscaling
autoscaling:
  # -- Enable autoscaling
  enabled: false

  # -- Number of minimum replicas
  minReplicas: 1

  # -- Number of maximum replicas
  maxReplicas: 100

  # -- CPU usage to target replica
  targetCPUUtilizationPercentage: 80
  # -- targetMemoryUtilizationPercentage: 80

# -- Additional volumes on the output Deployment definition.
volumes: []
# -- - name: foo
#   secret:
#     secretName: mysecret
#     optional: false

# -- Additional volumeMounts on the output Deployment definition.
volumeMounts: []
# -- - name: foo
#   mountPath: "/etc/foo"
#   readOnly: true

# -- Additional arguments on the output Deployment definition.
extraArgs: []

# -- Additional environments variables on the output Deployment definition.
extraEnv:
- name: HTTP_PROXY
  value: "http://<proxy>:8080"
- name: HTTPS_PROXY
  value: "http://<proxy>:8080"
- name: NO_PROXY
  value: "<ip-ranges>"

# Enable persistence using Persistent Volume Claims
# ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
persistentVolume:
  # -- Enable persistence using PVC
  enabled: true

  # -- Ollama server data Persistent Volume access modes
  # Must match those of existing PV or dynamic provisioner
  # Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
  accessModes:
  - ReadWriteOnce

  # -- Ollama server data Persistent Volume annotations
  annotations: {}

  # -- If you'd like to bring your own PVC for persisting Ollama state, pass the name of the
  # created + ready PVC here. If set, this Chart will not create the default PVC.
  # Requires server.persistentVolume.enabled: true
  existingClaim: ""

  # -- Ollama server data Persistent Volume size
  size: 30Gi

  # -- Ollama server data Persistent Volume Storage Class
  # If defined, storageClassName: <storageClass>
  # If set to "-", storageClassName: "", which disables dynamic provisioning
  # If undefined (the default) or set to null, no storageClassName spec is
  # set, choosing the default provisioner.  (gp2 on AWS, standard on
  # GKE, AWS & OpenStack)
  storageClass: "rook-ceph-block"

  # -- Ollama server data Persistent Volume Binding Mode
  # If defined, volumeMode: <volumeMode>
  # If empty (the default) or set to null, no volumeBindingMode spec is
  # set, choosing the default mode.
  volumeMode: ""

  # -- Subdirectory of Ollama server data Persistent Volume to mount
  # Useful if the volume's root directory is not empty
  subPath: ""

# -- Node labels for pod assignment.
nodeSelector: {}

# -- Tolerations for pod assignment
tolerations: []

# -- Affinity for pod assignment
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-role.kubernetes.io/backend-worker
          operator: Exists

# -- How to replace existing pods
updateStrategy:
  # -- Can be "Recreate" or "RollingUpdate". Default is RollingUpdate
  type: ""

# -- Init containers to add to the pod
initContainers: []
# - name: startup-tool
#   image: alpine:3
#   command: [sh, -c]
#   args:
#     - echo init
jdetroyes commented 3 months ago

Hello @tringler

It's look like is related to Dynamic LLM libraries, no libraries are automatically detected: time=2024-07-14T17:18:39.137Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries []"

You can try to set one manually according to this troubleshooting page