immich-app / immich-charts

Helm chart implementation of Immich
https://immich.app
GNU Affero General Public License v3.0
106 stars 45 forks source link

immich-machine-learning pod CrashLoopBackOff without logs #9

Closed x-real-ip closed 1 year ago

x-real-ip commented 1 year ago

Hi, I've deployed Immich on my kubernetes (k3s) cluster, everything is running except the immich-machine-learning pod it keeps crashing with no logs, so I really don't know what's happening and how to fix it. Any suggestions to debug this?

Screenshot from 2023-02-20 21-13-11

Command:

kubectl logs immich-machine-learning-8885d64cb-l2rvk

Output:

~$

Command:

kubectl describe pod immich-machine-learning-8885d64cb-l2rvk

Output:

Name:             immich-machine-learning-8885d64cb-l2rvk
Namespace:        tools
Priority:         0
Service Account:  default
Node:             k3s-master-02/10.0.100.102
Start Time:       Mon, 20 Feb 2023 20:50:26 +0100
Labels:           app=immich-machine-learning
                  pod-template-hash=8885d64cb
Annotations:      <none>
Status:           Running
IP:               10.42.2.161
IPs:
  IP:           10.42.2.161
Controlled By:  ReplicaSet/immich-machine-learning-8885d64cb
Init Containers:
  postgresql-isready:
    Container ID:  containerd://f3de44384d61f7743b8e7da3398feedfcecd3df0e9a4dc52d7efd1f9d76f4cc5
    Image:         harbor.k8s.lan/dockerhub-proxy/bitnami/postgresql:14.5.0-debian-11-r6
    Image ID:      harbor.k8s.lan/dockerhub-proxy/bitnami/postgresql@sha256:4355265e33e9c2a786aa145884d4b36ffd4c41c516b35d60df0b7495141ec738
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      until pg_isready -U "${POSTGRESQL_USERNAME}" -d "dbname=${DB_DATABASE_NAME}" -h immich-postgresql-hl -p 5432 ; do sleep 2 ; done
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 20 Feb 2023 20:50:28 +0100
      Finished:     Mon, 20 Feb 2023 20:50:34 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      POSTGRESQL_USERNAME:  <set to the key 'DB_USERNAME' in secret 'immich-secret-env'>    Optional: false
      POSTGRESQL_DATABASE:  <set to the key 'DB_DATABASE_NAME' of config map 'immich-env'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d5qj9 (ro)
Containers:
  immich-machine-learning:
    Container ID:  containerd://877582bf1c4c88ab58a94a49a47505d11c10f9e0a2b8cb780c3b12c1a46b06ce
    Image:         harbor.k8s.lan/dockerhub-proxy/altran1502/immich-machine-learning:v1.43.0
    Image ID:      harbor.k8s.lan/dockerhub-proxy/altran1502/immich-machine-learning@sha256:3373962c8d64b264b42751614be88590e279cc6442db0d55615a8daa9cead8f9
    Port:          3003/TCP
    Host Port:     0/TCP
    Command:
      /bin/sh
    Args:
      ./entrypoint.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    132
      Started:      Mon, 20 Feb 2023 21:16:41 +0100
      Finished:     Mon, 20 Feb 2023 21:16:41 +0100
    Ready:          False
    Restart Count:  10
    Liveness:       tcp-socket :3003 delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      tcp-socket :3003 delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:        tcp-socket :3003 delay=0s timeout=1s period=5s #success=1 #failure=30
    Environment Variables from:
      immich-env  ConfigMap  Optional: false
    Environment:
      DB_PASSWORD:  <set to the key 'DB_PASSWORD' in secret 'immich-secret-env'>  Optional: false
      DB_USERNAME:  <set to the key 'DB_USERNAME' in secret 'immich-secret-env'>  Optional: false
      MAPBOX_KEY:   <set to the key 'MAPBOX_KEY' in secret 'immich-secret-env'>   Optional: false
    Mounts:
      /usr/src/app/.reverse-geocoding-dump from geocoding-dump (rw)
      /usr/src/app/upload from library (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-d5qj9 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  geocoding-dump:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  library:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-nfs-immich-library
    ReadOnly:   false
  kube-api-access-d5qj9:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  29m                    default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  29m                    default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.
  Normal   Scheduled         29m                    default-scheduler  Successfully assigned tools/immich-machine-learning-8885d64cb-l2rvk to k3s-master-02
  Normal   Pulled            29m                    kubelet            Container image "harbor.k8s.lan/dockerhub-proxy/bitnami/postgresql:14.5.0-debian-11-r6" already present on machine
  Normal   Created           29m                    kubelet            Created container postgresql-isready
  Normal   Started           29m                    kubelet            Started container postgresql-isready
  Warning  Unhealthy         28m                    kubelet            Startup probe failed: dial tcp 10.42.2.161:3003: connect: connection refused
  Normal   Pulled            28m (x4 over 28m)      kubelet            Container image "harbor.k8s.lan/dockerhub-proxy/altran1502/immich-machine-learning:v1.43.0" already present on machine
  Normal   Created           28m (x4 over 28m)      kubelet            Created container immich-machine-learning
  Normal   Started           28m (x4 over 28m)      kubelet            Started container immich-machine-learning
  Warning  BackOff           3m59s (x138 over 28m)  kubelet            Back-off restarting failed container
bo0tzz commented 1 year ago

Does your machine support AVX instructions? See https://github.com/immich-app/immich#immich-machine-learning-fails-to-start

x-real-ip commented 1 year ago

Thanks! I have set the CPU to "host" for the proxmox VM's and now it is working! Sorry, missed that in the documentation.