immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
44.83k stars 2.18k forks source link

Immich stuck in restart loop after changing CLIP model and upgrading to v1.112 #11815

Closed Quba1 closed 1 month ago

Quba1 commented 1 month ago

The bug

Possibly related to: #11801

I had v1.111 installed and changed the CLIP model in administration panel to ViT-L-14-quickgelu__dfn2b. But after upgrading to v1.112 the immich_server container is now stuck in restart loop with error as below.

The issue is that I cannot even change the model, because I cannot access the admin panel.

The OS that Immich Server is running on

Debian 12 CT on Proxmox 8.2

Version of Immich Server

v1.112

Version of Immich Mobile App

n/a

Platform with the issue

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 127.63.63.63:2283:3001
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: true

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:e3b17ba9479deec4b7d1eeec1548a253acc5374d68d3b27937fcfe4df8d18c7e
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=./library
# The location where your database files are stored
DB_DATA_LOCATION=./postgres

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
# TZ=Etc/UTC

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=<redacted>

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

1. On 1.111 change CLIP model to `ViT-L-14-quickgelu__dfn2b` in admin panel
2. Upgrade to 1.112 following the recommended method from documentation
3. Get stuck in a loop...

Relevant log output

[Nest] 17  - 08/15/2024, 1:44:33 PM     LOG [Api:EventRepository] Initialized websocket server
Detected CPU Cores: 8
Starting api worker
Starting microservices worker
[Nest] 7  - 08/15/2024, 1:44:39 PM     LOG [Microservices:EventRepository] Initialized websocket server
[Nest] 7  - 08/15/2024, 1:44:39 PM     LOG [Microservices:SystemConfigService] LogLevel=log (set via system config)
[Nest] 7  - 08/15/2024, 1:44:39 PM     LOG [Microservices:MapRepository] Initializing metadata repository
[Nest] 7  - 08/15/2024, 1:44:39 PM     LOG [Microservices:MetadataService] Initialized local reverse geocoder
[Nest] 7  - 08/15/2024, 1:44:39 PM     LOG [Microservices:ServerService] Feature Flags: {
  "smartSearch": true,
  "facialRecognition": true,
  "duplicateDetection": true,
  "map": true,
  "reverseGeocoding": true,
  "sidecar": true,
  "search": true,
  "trash": true,
  "oauth": false,
  "oauthAutoLaunch": false,
  "passwordLogin": true,
  "configFile": false,
  "email": true
}
Error: Unknown CLIP model:  ViT-L-14-quickgelu__dfn2b
    at getCLIPModelInfo (/usr/src/app/dist/utils/misc.js:70:15)
    at /usr/src/app/dist/services/smart-info.service.js:69:61
    at /usr/src/app/dist/repositories/database.repository.js:186:29
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[Nest] 17  - 08/15/2024, 1:44:39 PM     LOG [Api:EventRepository] Initialized websocket server
microservices worker exited with code 1

Additional information

No response

Quba1 commented 1 month ago

You probably can change the version to v1.111.0 then run the server to change the url, then change back the version to release and start it up

Originally posted by @alextran1502 in https://github.com/immich-app/immich/discussions/11806#discussioncomment-10347047

I was able to follow that advice and change to the default model. So the issue is solved for me, but the bug itself persists. I'm keeping this issue open to let you decide if it should be closed or not.

mertalev commented 1 month ago

There was probably whitespace in the text you entered. The new validation will now handle this before you can save the config. It could also be worth it to use a dropdown instead.

mmomjian commented 1 month ago

Sounds like this is fixed by the new config validation.