otwld / ollama-helm

Helm chart for Ollama on Kubernetes
https://artifacthub.io/packages/helm/ollama-helm/ollama
MIT License
216 stars 35 forks source link

Ollama does not use the resources stated in the requests and limits #93

Closed JBleher closed 2 days ago

JBleher commented 2 weeks ago

What happened?

Suppose you have split an A80 with 80GB memory into 1 x "3g.40gb" slice and 4 x "1g10gb" slices. Then the ollama pod allocates the model to the 3g.40gb slice even though the ressource request and limit both state 1g10gb.

Also applying RessourceQuotas to the same namespace (here ollamatest) does not work:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: gpu-quota
  namespace: ollamatest
spec:
  hard:
    nvidia.com/mig-1g.10gb: 2

How can I dedicate one MIG slice to ollama?

Chart version

0.55.0

Kubernetes version

1.27.16

Kubernetes distribution

microk8s

Relevant log output

No response

Your values.yaml

# Default values for ollama-helm.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# -- Number of replicas
replicaCount: 1

# Docker image
image:
  # -- Docker image registry
  repository: ollama/ollama

  # -- Docker pull policy
  pullPolicy: IfNotPresent

  # -- Docker image tag, overrides the image tag whose default is the chart appVersion.
  tag: ""

# -- Docker registry secret names as an array
imagePullSecrets: []

# -- String to partially override template  (will maintain the release name)
nameOverride: ""

# -- String to fully override template
fullnameOverride: ""

# Ollama parameters
ollama:
  gpu:
    # -- Enable GPU integration
    enabled: gpu

    # -- GPU type: 'nvidia' or 'amd'
    # If 'ollama.gpu.enabled', default value is nvidia
    # If set to 'amd', this will add 'rocm' suffix to image tag if 'image.tag' is not override
    # This is due cause AMD and CPU/CUDA are different images
    type: 'nvidia'
    # Specify the number of GPUs
    number: 1
    # -- only for nvidia cards; change to (example) 'nvidia.com/mig-1g.10gb' to use one MIG slice
    nvidiaResource: "nvidia.com/gpu" 
    # nvidiaResource: "nvidia.com/mig-1g.10gb" # example
    #
    # If you want to use more than one MIG slice you can use the following syntax (then nvidiaResource above is then ignored)
    mig: 
      # If MIG is set to false this section is ignored
      enabled: true
      devices:
       # Specify the mig devices and the corresponding number
        1g.10gb: 1
  # -- List of models to pull at container startup
  # The more you add, the longer the container will take to start if models are not present
  models:
    - mistral

  # -- Add insecure flag for pulling at container startup
  insecure: true

# Service account
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount:
  # -- Specifies whether a service account should be created
  create: true

  # -- Automatically mount a ServiceAccount's API credentials?
  automount: true

  # -- Annotations to add to the service account
  annotations: {}

  # -- The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: ""

# -- Map of annotations to add to the pods
podAnnotations: {}

# -- Map of labels to add to the pods
podLabels: {}

# -- Pod Security Context
podSecurityContext: {}
  # fsGroup: 2000

# -- Container Security Context
securityContext: {}
  # capabilities:
  #  drop:
  #   - ALL
  # readOnlyRootFilesystem: true
  # runAsNonRoot: true
  # runAsUser: 1000

# -- Specify runtime class
runtimeClassName: ""

# Configure Service
service:

  # -- Service type
  type: NodePort

  # -- Service port
  port: 11434

  # -- Service node port when service type is 'NodePort'
  nodePort: 11434

# Configure the ingress resource that allows you to access the
ingress:
  # -- Enable ingress controller resource
  enabled: false

  # -- IngressClass that will be used to implement the Ingress (Kubernetes 1.18+)
  className: ""

  # -- Additional annotations for the Ingress resource.
  annotations: {}
    # kubernetes.io/ingress.class: traefik
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"

  # The list of hostnames to be covered with this ingress record.
  hosts:
    - host: localhost
      paths:
        - path: /
          pathType: Prefix

  # --  The tls configuration for hostnames to be covered with this ingress record.
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

# Configure resource requests and limits
# ref: http://kubernetes.io/docs/user-guide/compute-resources/
resources:
  # Pod requests
  requests: {}
    # -- Memory request
    # memory: 4096Mi

    # -- CPU request
    # cpu: 2000m

  # Pod limit
  limits:
    # {}
    # -- Memory limit
    # memory: 8192Mi

    # -- CPU limit
    #cpu: 4000m

# Configure extra options for liveness probe
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes
livenessProbe:
  # -- Enable livenessProbe
  enabled: true

  # -- Request path for livenessProbe
  path: /

  # -- Initial delay seconds for livenessProbe
  initialDelaySeconds: 60

  # -- Period seconds for livenessProbe
  periodSeconds: 10

  # -- Timeout seconds for livenessProbe
  timeoutSeconds: 5

  # -- Failure threshold for livenessProbe
  failureThreshold: 6

  # -- Success threshold for livenessProbe
  successThreshold: 1

# Configure extra options for readiness probe
# ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#configure-probes
readinessProbe:
  # -- Enable readinessProbe
  enabled: true

  # -- Request path for readinessProbe
  path: /

  # -- Initial delay seconds for readinessProbe
  initialDelaySeconds: 30

  # -- Period seconds for readinessProbe
  periodSeconds: 5

  # -- Timeout seconds for readinessProbe
  timeoutSeconds: 3

  # -- Failure threshold for readinessProbe
  failureThreshold: 6

  # -- Success threshold for readinessProbe
  successThreshold: 1

# Configure autoscaling
autoscaling:
  # -- Enable autoscaling
  enabled: false 

  # -- Number of minimum replicas
  minReplicas: 1

  # -- Number of maximum replicas
  maxReplicas: 4

  # -- CPU usage to target replica
  targetCPUUtilizationPercentage: 80

  # -- targetMemoryUtilizationPercentage: 80

# -- Additional volumes on the output Deployment definition.
volumes: []
# -- - name: foo
#   secret:
#     secretName: mysecret
#     optional: false

# -- Additional volumeMounts on the output Deployment definition.
volumeMounts: []
# -- - name: foo
#   mountPath: "/etc/foo"
#   readOnly: true

# -- Additional arguments on the output Deployment definition.
extraArgs: []

# -- Additional environments variables on the output Deployment definition.
extraEnv: []
#  - name: OLLAMA_DEBUG
#    value: "1"

# Enable persistence using Persistent Volume Claims
# ref: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
persistentVolume:
  # -- Enable persistence using PVC
  enabled: false

  # -- Ollama server data Persistent Volume access modes
  # Must match those of existing PV or dynamic provisioner
  # Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
  accessModes:
    - ReadWriteOnce

  # -- Ollama server data Persistent Volume annotations
  annotations: {}

  # -- If you'd like to bring your own PVC for persisting Ollama state, pass the name of the
  # created + ready PVC here. If set, this Chart will not create the default PVC.
  # Requires server.persistentVolume.enabled: true
  existingClaim: ""

  # -- Ollama server data Persistent Volume size
  size: 30Gi

  # -- Ollama server data Persistent Volume Storage Class
  # If defined, storageClassName: <storageClass>
  # If set to "-", storageClassName: "", which disables dynamic provisioning
  # If undefined (the default) or set to null, no storageClassName spec is
  # set, choosing the default provisioner.  (gp2 on AWS, standard on
  # GKE, AWS & OpenStack)
  storageClass: ""

  # -- Ollama server data Persistent Volume Binding Mode
  # If defined, volumeMode: <volumeMode>
  # If empty (the default) or set to null, no volumeBindingMode spec is
  # set, choosing the default mode.
  volumeMode: ""

  # -- Subdirectory of Ollama server data Persistent Volume to mount
  # Useful if the volume's root directory is not empty
  subPath: ""

# -- Node labels for pod assignment.
nodeSelector: {}

# -- Tolerations for pod assignment
tolerations: []

# -- Affinity for pod assignment
affinity: []

Additional context

No response

jdetroyes commented 2 weeks ago

Hello @JBleher

Ollama auto discover devices, can you check this

And if needed, you can set env variable:

# -- Additional environments variables on the output Deployment definition.
extraEnv: []
#  - name: OLLAMA_DEBUG
#    value: "1"