immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
47.05k stars 2.36k forks source link

Library refresh: errors, too many assets, too many jobs #12494

Open blablack opened 3 weeks ago

blablack commented 3 weeks ago

The bug

Hello,

I have been using Immich for the past few months now and until recently everything was working fine. I have one external library with the legacy pictures I have from before Immich, and since using Immich I have been uploading pictures directly through Immich.

Lately, when doing a Library refresh, several things go wrong:

This issue has only started to appear recently. It started to appear around the upgrade to version v1.113.0 but I cannot tell if it is due to the upgrade or due to another corruption that happened around the same time.

Let me know if I can provide any other useful information.

Thanks, Aurélien

The OS that Immich Server is running on

kubernetes

Version of Immich Server

v1.114.0

Version of Immich Mobile App

1.114.0 build.158

Platform with the issue

Your docker-compose.yml content

apiVersion: v1
kind: ConfigMap
metadata:
  name: immich-postgres
data:
  create-extensions.sql: |
    CREATE EXTENSION IF NOT EXISTS cube;
    CREATE EXTENSION IF NOT EXISTS earthdistance;
    CREATE EXTENSION IF NOT EXISTS vectors;
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    CREATE EXTENSION IF NOT EXISTS unaccent;
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: immich
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: immich
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: immich
    spec:
      volumes:
        - name: immich-pvc
          persistentVolumeClaim:
            claimName: immich-pvc
        - name: nasio-nfs-pvc
          persistentVolumeClaim:
            claimName: nasio-nfs-pvc
        - configMap:
            name: immich-postgres
          name: immich-postgres-vol
      containers:
        - image: tensorchord/pgvecto-rs:pg16-v0.2.1
          imagePullPolicy: IfNotPresent
          name: postgres
          env:
            - name: POSTGRES_USER
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: POSTGRES_PASSWORD
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: POSTGRES_DB
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: POSTGRES_INITDB_ARGS
              value: "--data-checksums"
          volumeMounts:
            - mountPath: "/var/lib/postgresql/data"
              subPath: "postgresql"
              name: immich-pvc
            - name: immich-postgres-vol
              subPath: "create-extensions.sql"
              mountPath: "/docker-entrypoint-initdb.d/create-extensions.sql"
          resources:
            limits:
              cpu: 1500m
              memory: 3000Mi
            requests:
              cpu: 10m
              memory: 300Mi
        - image: redis:latest
          imagePullPolicy: IfNotPresent
          name: redis
          resources:
            limits:
              cpu: 40m
              memory: 200Mi
            requests:
              cpu: 10m
              memory: 10Mi
        - image: ghcr.io/immich-app/immich-server:release
          imagePullPolicy: Always
          name: immich-server
          env:
            - name: DB_HOSTNAME
              value: "localhost"
            - name: DB_USERNAME
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: DB_PASSWORD
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: DB_DATABASE_NAME
              value: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
            - name: REDIS_HOSTNAME
              value: "localhost"
            - name: IMMICH_PORT
              value: "3001"
            - name: IMMICH_MACHINE_LEARNING_URL
              value: "http://localhost:3003"
          ports:
            - containerPort: 3001
              name: http
          livenessProbe:
            httpGet:
              path: /server-info/ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /server-info/ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          volumeMounts:
            - mountPath: "/photos"
              subPath: "Our Pictures"
              name: nasio-nfs-pvc
              readOnly: true
            - mountPath: "/usr/src/app/upload"
              subPath: "Kubernetes/Our Pictures - Immich"
              name: nasio-nfs-pvc
          resources:
            limits:
              cpu: 1500m
              memory: 2500Mi
            requests:
              cpu: 10m
              memory: 100Mi
        - image: ghcr.io/immich-app/immich-machine-learning:release
          imagePullPolicy: Always
          name: immich-machine-learning
          env:
            - name: TRANSFORMERS_CACHE
              value: "/cache"
            - name: IMMICH_PORT
              value: "3003"
          ports:
            - containerPort: 3003
              name: http
          livenessProbe:
            httpGet:
              path: /ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /ping
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 10
            failureThreshold: 5
          volumeMounts:
            - mountPath: "/cache"
              subPath: "ml-cache"
              name: immich-pvc
          resources:
            limits:
              cpu: 1500m
              memory: 1500Mi
            requests:
              cpu: 10m
              memory: 1000Mi
      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 10.43.0.22
---
apiVersion: v1
kind: Service
metadata:
  name: immich
  annotations:
    metallb.universe.tf/address-pool: default
    metallb.universe.tf/loadBalancerIPs: 192.168.2.210
spec:
  externalTrafficPolicy: Local
  selector:
    app: immich
  ports:
    - name: http-80
      protocol: TCP
      port: 80
      targetPort: 3001
  type: LoadBalancer

Your .env content

c.f. k8s deployment

Reproduction steps

  1. Click Administration
  2. Click Jobs
  3. Click "Refresh" or "All" at the Library section

Relevant log output

[Nest] 7  - 09/09/2024, 9:01:30 AM     LOG [Microservices:LibraryService] Refreshing library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
[Nest] 7  - 09/09/2024, 9:02:21 AM     LOG [Microservices:LibraryService] Refreshing library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
[Nest] 7  - 09/09/2024, 9:03:15 AM     LOG [Microservices:LibraryService] Finished queueing online check of 19620 assets for library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
Error: Missing lock for job 392428. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
Error: Missing lock for job 392465. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[Nest] 7  - 09/09/2024, 9:04:53 AM     LOG [Microservices:LibraryService] Finished queueing online check of 19620 assets for library 6aaeca4c-e495-413c-a1ea-8b8e50b750d2
Error: Missing lock for job 392482. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
Error: Missing lock for job 392428. retryJob
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:266:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)

Additional information

No response

Luvbe3 commented 5 hours ago

Me too. i'm very afraid that refreshing the library will transcode all the videos and recreate all the thumbnails. if only refreshing the library adds new files and deletes any that are broken or missing, that would be better. if didn't transcode or recreate thumbnails, that would be great.