immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
52.04k stars 2.76k forks source link

Unexpected app crush (502 response code) #12809

Closed illmouse closed 1 month ago

illmouse commented 1 month ago

The bug

Apparently during some maintenance (because it happened at 00:00), app became unavailable. Other containers, according to logs, is still app and running.

This issue didn't drop main PID in container so container hasn't been restarted properly.

The OS that Immich Server is running on

ghcr.io/immich-app/immich-server:release

Version of Immich Server

1.115.0

Version of Immich Mobile App

1.115.0

Platform with the issue

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    networks:
      - immich-frontend
      - immich-backend
    env_file:
      - .env
    #ports:
    #  - 2283:3001
    #expose:
    #  - "3001"
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    deploy:
      resources:
        limits:
          cpus: '0.5'
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    networks:
      - immich-backend
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:328fe6a5822256d065debb36617a8169dbfbd77b797c525288e465f56c1d392b
    healthcheck:
      test: redis-cli ping || exit 1
    volumes:
      - redis-data:/data
    networks:
      - immich-backend
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    networks:
      - immich-backend
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_>
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "ma>
    restart: always

volumes:
  model-cache:
  redis-data:

networks:
  immich-frontend:
    name: immich-frontend
    external: true
  immich-backend:
    name: immich-backend
    external: true

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=/mnt/immich-data/
# The location where your database files are stored
DB_DATA_LOCATION=/mnt/immich-db/

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zon>
# TZ=Etc/UTC
TZ=Europe/Moscow

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=xxx

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=xxx
DB_DATABASE_NAME=immich

IMMICH_ENV=production
CPU_CORES=4

Reproduction steps

Not sure, app became unavailable at 00:00. Perhaps during scheduled maintenance.

Relevant log output

[Nest] 6  - 09/20/2024, 12:00:00 AM   ERROR [Microservices:JobService] Unable to run job handler (thumbnailGeneration/generate-preview): Error: VipsJpeg: Corrupt JPEG data: bad Huffman code
VipsJpeg: Corrupt JPEG data: bad Huffman code
VipsJpeg: Premature end of input file
[Nest] 6  - 09/20/2024, 12:00:00 AM   ERROR [Microservices:JobService] Error: VipsJpeg: Corrupt JPEG data: bad Huffman code
VipsJpeg: Corrupt JPEG data: bad Huffman code
VipsJpeg: Premature end of input file
    at Sharp.toFile (/usr/src/app/node_modules/sharp/lib/output.js:90:19)
    at MediaRepository.generateThumbnail (/usr/src/app/dist/repositories/media.repository.js:69:14)
    at MediaService.generateThumbnail (/usr/src/app/dist/services/media.service.js:176:48)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async MediaService.handleGeneratePreview (/usr/src/app/dist/services/media.service.js:141:29)
    at async /usr/src/app/dist/services/job.service.js:148:36
    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[Nest] 6  - 09/20/2024, 12:00:00 AM   ERROR [Microservices:JobService] Object:
{
  "id": "dc4f79d6-e58f-4444-a080-fee8cb5e5f3a"
}

Detected CPU Cores: 4
Starting api worker
Starting microservices worker
[Nest] 7  - 09/20/2024, 12:00:49 AM     LOG [Microservices:EventRepository] Initialized websocket server
[Nest] 15  - 09/20/2024, 12:00:49 AM     LOG [Api:EventRepository] Initialized websocket server

Additional information

No response

illmouse commented 1 month ago

Sorry, further actions to get app back up showed there is a problem with postgres container. I will further investigate the cause.

illmouse commented 1 month ago

Looks like it was a problem that mounted LUN storing DB became unresponsive. So no real problem with immich.