immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
44.98k stars 2.18k forks source link

Unable to run Jobs/Edit Server Settings on 1.106.2 #10165

Closed MegaShinySnivy closed 3 months ago

MegaShinySnivy commented 3 months ago

The bug

Attempting to run any jobs via the admin screen or edit settings causes the server to give a 405. My session also no longer seems to be persisting properly, as if I return to the base URL (immich.mydomain.com instead of immich.mydomain.com/photos) it kicks me back out. In an attempt to debug further, I turned up the logging information. However, this yielded nothing relevant other than ping messages and websocket connect/disconnects.

The OS that Immich Server is running on

Debian 12

Version of Immich Server

v1.106.2

Version of Immich Mobile App

N/A

Platform with the issue

Your docker-compose.yml content

---
# Yeah, this is running on K3s, so you get a helmrelease instead.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: immich
  namespace: immich
spec:
  interval: 15m
  chart:
    spec:
      chart: immich
      version: 0.7.0
      interval: 30m
      sourceRef:
        kind: HelmRepository
        name: immich
        namespace: flux-system
  install:
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
  values:
    controller:
      annotations:
        reloader.stakater.com/auto: "true"
    image:
      tag: &appVersion "v1.106.2"
    postgres:
      enabled: false
    redis:
      image:
        tag: 7.0.11-debian-11-r18
      enabled: true
      architecture: standalone
      auth:
        enabled: false
      persistence:
        enabled: false
        medium: ""  # Specify the medium for emptyDir (can be "" or "Memory")
        sizeLimit: "8Gi"
        path: /data
      resources:
        requests:
          cpu: 15m
          memory: 10Mi
        limits:
          memory: 10Mi
    immich:
      metrics:
        enabled: true
      persistence:
        library:
          existingClaim: immich-nfs
    server:
      enabled: true
      controller:
        strategy: RollingUpdate
        rollingUpdate:
          unavailable: "1"
      image:
        repository: ghcr.io/immich-app/immich-server
        tag: *appVersion
        pullPolicy: IfNotPresent
      ingress:
        main:
          enabled: true
          ingressClassName: external
          annotations:
            hajimari.io/icon: mdi:camera-iris
            nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
            nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
            external-dns.alpha.kubernetes.io/target: "external.${SECRET_DOMAIN}"
            # Disabled OWASP as I was wondering if it was messing with things.
            nginx.ingress.kubernetes.io/enable-owasp-core-rules: "false"
            nginx.ingress.kubernetes.io/proxy-body-size: "0"
          hosts:
            - host: &host "immich.${SECRET_DOMAIN}"
              paths:
                - path: /
                  pathType: Prefix
          tls:
            - hosts:
                - *host
      resources:
        requests:
          cpu: 30m
          memory: 600Mi
    machine-learning:
      enabled: true
      controller:
        strategy: RollingUpdate
        rollingUpdate:
          unavailable: "1"
      image:
        repository: ghcr.io/immich-app/immich-machine-learning
        tag: *appVersion
      persistence:
        geodata-cache:
          enabled: true
          size: 8Gi
          type: emptyDir
          accessMode: ReadWriteMany
    env:
      DB_DATABASE_NAME: immich
      DB_HOSTNAME:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: host
      DB_USERNAME:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: user
      DB_PASSWORD:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: password
      DB_URL:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: uri
      IMMICH_LOG_LEVEL: verbose
      IMMICH_MACHINE_LEARNING_URL: '{{ printf "http://%s-machine-learning:3003" .Release.Name }}'

Your .env content

env:
      DB_DATABASE_NAME: immich
      DB_HOSTNAME:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: host
      DB_USERNAME:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: user
      DB_PASSWORD:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: password
      DB_URL:
        valueFrom:
          secretKeyRef:
            name: postgres-immich-app
            key: uri
      IMMICH_LOG_LEVEL: verbose
      IMMICH_MACHINE_LEARNING_URL: '{{ printf "http://%s-machine-learning:3003" .Release.Name }}'

Reproduction steps

1. Upgrade from 1.105.1 to 1.106.2
2. Log in as an admin
3. Try and run a job
4. Notice it gives a 405 and nothing happens
5. Go back to the baseURL
6. Also notice you are no longer logged in

Relevant log output

Funny thing! The verbose log output has no information that is useful, only pings, some access logs, and websocket connect/disconnect

Additional information

As said, this is running on K3s. If you want more information, see https://github.com/MegaShinySnivy/Scaling-Snakes

bo0tzz commented 3 months ago

I doubt this is it, but just to rule it out, can you try with redis 6.2 like we use in the default setup? I'd also be interesting in the redis logs.

MegaShinySnivy commented 3 months ago

Here are the logs pre-revert to 6.2...

1:C 08 Jun 2024 07:08:16.359 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 08 Jun 2024 07:08:16.359 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 08 Jun 2024 07:08:16.359 # Configuration loaded
1:M 08 Jun 2024 07:08:16.359 * monotonic clock: POSIX clock_gettime
1:M 08 Jun 2024 07:08:16.360 * Running mode=standalone, port=6379.
1:M 08 Jun 2024 07:08:16.360 # Server initialized
1:M 08 Jun 2024 07:08:16.370 * Reading RDB base file on AOF loading...
1:M 08 Jun 2024 07:08:16.370 * Loading RDB produced by version 7.0.11
1:M 08 Jun 2024 07:08:16.370 * RDB age 379632 seconds
1:M 08 Jun 2024 07:08:16.370 * RDB memory usage when created 7.94 Mb
1:M 08 Jun 2024 07:08:16.370 * RDB is base AOF
1:M 08 Jun 2024 07:08:16.439 * Done loading RDB, keys loaded: 69, keys expired: 0.
1:M 08 Jun 2024 07:08:16.439 * DB loaded from base file appendonly.aof.32.base.rdb: 0.075 seconds
1:M 08 Jun 2024 07:08:17.439 * DB loaded from incr file appendonly.aof.32.incr.aof: 1.000 seconds
1:M 08 Jun 2024 07:08:17.439 * DB loaded from append only file: 1.075 seconds
1:M 08 Jun 2024 07:08:17.439 * Opening AOF incr file appendonly.aof.32.incr.aof on server start
1:M 08 Jun 2024 07:08:17.439 * Ready to accept connections
MegaShinySnivy commented 3 months ago

And post downgrade.

1:C 11 Jun 2024 20:52:40.055 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 11 Jun 2024 20:52:40.055 # Redis version=6.2.14, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 11 Jun 2024 20:52:40.055 # Configuration loaded
1:M 11 Jun 2024 20:52:40.056 * monotonic clock: POSIX clock_gettime
1:M 11 Jun 2024 20:52:40.057 * Running mode=standalone, port=6379.
1:M 11 Jun 2024 20:52:40.058 # Server initialized
1:M 11 Jun 2024 20:52:40.058 * Ready to accept connections
MegaShinySnivy commented 3 months ago

Tested, sadly no change

applealias03 commented 3 months ago

I'm experiencing the same issue. Updated stack with an image pull for the new version. Whenever I go to queue a job on the server the 'Waiting' value increments but no action follows. Additionally, this admin panel will freeze at times when attempting to queue a job or change a setting, requiring a full node restart or re-compose.

Actions completed in attempt to correct the issue, to no avail:

1:C 11 Jun 2024 21:28:11.223 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 11 Jun 2024 21:28:11.223 # Redis version=6.2.14, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 11 Jun 2024 21:28:11.223 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 11 Jun 2024 21:28:11.223 * monotonic clock: POSIX clock_gettime
1:M 11 Jun 2024 21:28:11.223 * Running mode=standalone, port=6379.
1:M 11 Jun 2024 21:28:11.223 # Server initialized
1:M 11 Jun 2024 21:28:11.223 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 11 Jun 2024 21:28:11.223 * Ready to accept connections

Notably different from @MegaShinySnivy logs, I appear not to have a config file set and don't know where to locate said file. I assume the memory overcommit warning is relevant to the lack of this config file, as this attribute would likely be set by the conf.

MegaShinySnivy commented 3 months ago

@applealias03, are you running this on K8s or docker compose?

applealias03 commented 3 months ago

@applealias03, are you running this on K8s or docker compose?

Compose. Here is the script I've been using as of this most recent update:


name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: ['start.sh', 'immich']
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - stack.env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    environment:
      IMMICH_WORKERS_INCLUDE: 'api'
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: ['start.sh', 'immich']
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - stack.env
    depends_on:
      - redis
      - database
    environment:
      IMMICH_WORKERS_EXCLUDE: 'api'
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - stack.env
    restart: always

  redis:
    container_name: immich_redis
    image: registry.hub.docker.com/library/redis:6.2-alpine@sha256:84882e87b54734154586e5f8abd4dce69fe7311315e2fc6d67c29614c8de2672
    restart: always

  database:
    container_name: immich_postgres
    image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    restart: always
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]

volumes:
  model-cache:
MegaShinySnivy commented 3 months ago

SWAG, but it could be the immich helm chart sets a configuration that isn't included with the compose.

applealias03 commented 3 months ago

Noticed a difference between the SHA keys for the redis image in the newest docker-compose.yml and my compose.

image

Also the new health-check. I thought the difference in SHA might relate to a minor update revision, but making this replacement has not affected the output.

srosorcxisto commented 3 months ago

I just upgraded from v1.105.1 am now having the same issue as @applealias03. Nothing unusual at all in the logs and Redis is ready to accept connections, just an increment of the job waiting count with no work being performed.

srosorcxisto commented 3 months ago

I just upgraded from v1.105.1 am now having the same issue as @applealias03. Nothing unusual at all in the logs and Redis is ready to accept connections, just an increment of the job waiting count with no work being performed.

Solved, it forgot to remove command: ['start.sh', 'immich'] from the immich-server service in the compose script. Removing that and recomposing resolved the issue.

applealias03 commented 3 months ago

I believe I have managed to entirely fix this issue. I now have an active Smart Search job running and all outputs seem to be correct.

Here are the steps I went through in order to correct this:

  1. Compose down
  2. Remove the immich_microservices and immich_redis containers if they are still present.
  3. Remove the images for immich-server and redis
  4. Edit compose file to be similar to the following:
    
    # https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml

name: immich

services: immich-server: container_name: immich_server image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}

volumes:
  - ${UPLOAD_LOCATION}:/usr/src/app/upload
  - /etc/localtime:/etc/localtime:ro
env_file:
  - .env
ports:
  - 2283:3001
depends_on:
  - redis
  - database

restart: always

immich-machine-learning: container_name: immich_machine_learning

For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.

# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
# extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
#   file: hwaccel.ml.yml
#   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
volumes:
  - model-cache:/cache
env_file:
  - .env
restart: always

redis: container_name: immich_redis image: docker.io/redis:6.2-alpine@sha256:d6c2911ac51b289db208767581a5d154544f2b2fe4914ea5056443f62dc6e900 healthcheck: test: redis-cli ping || exit 1 restart: always

database: container_name: immich_postgres image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0 environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} POSTGRES_INITDB_ARGS: '--data-checksums' volumes:

volumes: model-cache:



- The most notable changes in this compose file are:
1. Entire removal of the `immich_microservices` section
2. Removal of the start command for the server environment
3. New redis-alpine image `@sha256:d6c2911ac51b289db208767581a5d154544f2b2fe4914ea5056443f62dc6e900`
4. Addition of the healthcheck for `immich_redis`

After this and `docker compose up` everything seems to now be running smoothly.
applealias03 commented 3 months ago

I just upgraded from v1.105.1 am now having the same issue as @applealias03. Nothing unusual at all in the logs and Redis is ready to accept connections, just an increment of the job waiting count with no work being performed.

Solved, it forgot to remove command: ['start.sh', 'immich'] from the immich-server service in the compose script. Removing that and recomposing resolved the issue.

Didn't see you basically fixed it the same way! Glad this worked out for you too.

MegaShinySnivy commented 3 months ago

Ahh, the container defaults to running start.sh by default. What is it recommended to be set to now?

applealias03 commented 3 months ago

Ahh, the container defaults to running start.sh by default. What is it recommended to be set to now?

Changelog recommends to omit the line entirely. See: https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml

MegaShinySnivy commented 3 months ago

Strange. I just checked over my pod yaml. There's no command key. At all.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/restartedAt: "2024-05-09T23:31:09-05:00"
  creationTimestamp: "2024-06-11T20:11:25Z"
  generateName: immich-server-6f946b75d6-
  labels:
    app.kubernetes.io/instance: immich
    app.kubernetes.io/name: server
    pod-template-hash: 6f946b75d6
  name: immich-server-6f946b75d6-5q79z
  namespace: immich
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: immich-server-6f946b75d6
    uid: 2181f8e7-10b8-4250-a2ab-467b3d0420f0
  resourceVersion: "181100313"
  uid: 882598a0-88df-4446-9d4d-923e7aba3842
spec:
  automountServiceAccountToken: true
  containers:
  - env:
    - name: DB_DATABASE_NAME
      value: immich
    - name: DB_HOSTNAME
      valueFrom:
        secretKeyRef:
          key: host
          name: postgres-immich-app
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: postgres-immich-app
    - name: DB_URL
      valueFrom:
        secretKeyRef:
          key: uri
          name: postgres-immich-app
    - name: DB_USERNAME
      valueFrom:
        secretKeyRef:
          key: user
          name: postgres-immich-app
    - name: IMMICH_LOG_LEVEL
      value: verbose
    - name: IMMICH_MACHINE_LEARNING_URL
      value: http://immich-machine-learning:3003
    - name: IMMICH_METRICS
      value: "true"
    - name: REDIS_HOSTNAME
      value: immich-redis-master
    image: ghcr.io/immich-app/immich-server:v1.106.2
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /api/server-info/ping
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: immich-server
    ports:
    - containerPort: 3001
      name: http
      protocol: TCP
    - containerPort: 8081
      name: metrics-api
      protocol: TCP
    - containerPort: 8082
      name: metrics-ms
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /api/server-info/ping
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 30m
        memory: 600Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/src/app/upload
      name: library
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-wx99g
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: k8s-worker-3
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: library
    persistentVolumeClaim:
      claimName: immich-nfs
  - name: kube-api-access-wx99g
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
MegaShinySnivy commented 3 months ago

Update: Hashed some things out over discord, it was a combination of a custom set of error pages and my WAF interfering.