immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
49.51k stars 2.61k forks source link

Failed to Smart Search #13198

Closed jdicioccio closed 2 weeks ago

jdicioccio commented 2 weeks ago

The bug

When doing a text search with Immich 1.117.0, I'm getting errors loading the model. I tried removing the model cache volume, but it just downloads the non-functioning model again.

The OS that Immich Server is running on

Debian bookworm

Version of Immich Server

v1.117.0

Version of Immich Mobile App

v1.117.0

Platform with the issue

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/hardware-transcoding
      file: hwaccel.transcoding.yml
      service: rkmpp # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /data/photoprism/photos:/photoprism/photos:ro
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-armnn
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: armnn # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:e3b17ba9479deec4b7d1eeec1548a253acc5374d68d3b27937fcfe4df8d18c7e
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    restart: always
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=/data/immich/library
# The location where your database files are stored
DB_DATA_LOCATION=/data/immich/db

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=...

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

  1. Open mobile app or web app
  2. Perform a text search

Relevant log output

[10/05/24 01:06:28] INFO     Downloading textual model
                             'ViT-B-16-SigLIP-384__webli'. This may take a
                             while.
Fetching 11 files: 100%|██████████| 11/11 [00:20<00:00,  1.83s/it]
[10/05/24 01:06:49] INFO     Loading textual model 'ViT-B-16-SigLIP-384__webli'
                             to memory
arm_release_ver: g13p0-01eac0, rk_so_ver: 10
[10/05/24 01:06:49] INFO     Loading ANN model
                             /cache/clip/ViT-B-16-SigLIP-384__webli/textual/mode
                             l.armnn ...
Warning: WARNING: Layer of type Cast is not supported on requested backend GpuAcc for input data type Signed32 and output data type Signed64 (reason: in validate_arguments src/gpu/cl/kernels/ClCastKernel.cpp:59: ITensor data type S64 not supported by this kernel), falling back to the next backend.
Warning: ERROR: Layer of type Cast is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Gather is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/core/CL/kernels/CLGatherKernel.cpp:58: ITensor data type S64 not supported by this kernel), falling back to the next backend.
Warning: ERROR: Layer of type Gather is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
[10/05/24 01:06:50] ERROR    Exception in ASGI application

                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:152 in predict             │
                             │                                                 │
                             │   149 │   │   inputs = text                     │
                             │   150 │   else:                                 │
                             │   151 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 152 │   response = await run_inference(inputs │
                             │   153 │   return ORJSONResponse(response)       │
                             │   154                                           │
                             │   155                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:175 in run_inference       │
                             │                                                 │
                             │   172 │   │   response[entry["task"]] = output  │
                             │   173 │                                         │
                             │   174 │   without_deps, with_deps = entries     │
                             │ ❱ 175 │   await asyncio.gather(*[_run_inference │
                             │   176 │   if with_deps:                         │
                             │   177 │   │   await asyncio.gather(*[_run_infer │
                             │   178 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/app/main.py:169 in _run_inference      │
                             │                                                 │
                             │   166 │   │   │   except KeyError:              │
                             │   167 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   168 │   │   │   │   raise HTTPException(400,  │
                             │ ❱ 169 │   │   model = await load(model)         │
                             │   170 │   │   output = await run(model.predict, │
                             │   171 │   │   outputs[model.identity] = output  │
                             │   172 │   │   response[entry["task"]] = output  │
                             │                                                 │
                             │ /usr/src/app/main.py:213 in load                │
                             │                                                 │
                             │   210 │   │   return model                      │
                             │   211 │                                         │
                             │   212 │   try:                                  │
                             │ ❱ 213 │   │   return await run(_load, model)    │
                             │   214 │   except (OSError, InvalidProtobuf, Bad │
                             │   215 │   │   log.warning(f"Failed to load {mod │
                             │       '{model.model_name}'. Clearing cache.")   │
                             │   216 │   │   model.clear_cache()               │
                             │                                                 │
                             │ /usr/src/app/main.py:188 in run                 │
                             │                                                 │
                             │   185 │   if thread_pool is None:               │
                             │   186 │   │   return func(*args, **kwargs)      │
                             │   187 │   partial_func = partial(func, *args, * │
                             │ ❱ 188 │   return await asyncio.get_running_loop │
                             │   189                                           │
                             │   190                                           │
                             │   191 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/local/lib/python3.11/concurrent/futures/th │
                             │ read.py:58 in run                               │
                             │                                                 │
                             │ /usr/src/app/main.py:200 in _load               │
                             │                                                 │
                             │   197 │   │   │   raise HTTPException(500, f"Fa │
                             │   198 │   │   with lock:                        │
                             │   199 │   │   │   try:                          │
                             │ ❱ 200 │   │   │   │   model.load()              │
                             │   201 │   │   │   except FileNotFoundError as e │
                             │   202 │   │   │   │   if model.model_format ==  │
                             │   203 │   │   │   │   │   raise e               │
                             │                                                 │
                             │ /usr/src/app/models/base.py:53 in load          │
                             │                                                 │
                             │    50 │   │   self.download()                   │
                             │    51 │   │   attempt = f"Attempt #{self.load_a │
                             │       else "Loading"                            │
                             │    52 │   │   log.info(f"{attempt} {self.model_ │
                             │       '{self.model_name}' to memory")           │
                             │ ❱  53 │   │   self.session = self._load()       │
                             │    54 │   │   self.loaded = True                │
                             │    55 │                                         │
                             │    56 │   def predict(self, *inputs: Any, **mod │
                             │                                                 │
                             │ /usr/src/app/models/clip/textual.py:26 in _load │
                             │                                                 │
                             │    23 │   │   return res                        │
                             │    24 │                                         │
                             │    25 │   def _load(self) -> ModelSession:      │
                             │ ❱  26 │   │   session = super()._load()         │
                             │    27 │   │   log.debug(f"Loading tokenizer for │
                             │    28 │   │   self.tokenizer = self._load_token │
                             │    29 │   │   tokenizer_kwargs: dict[str, Any]  │
                             │                                                 │
                             │ /usr/src/app/models/base.py:78 in _load         │
                             │                                                 │
                             │    75 │   │   )                                 │
                             │    76 │                                         │
                             │    77 │   def _load(self) -> ModelSession:      │
                             │ ❱  78 │   │   return self._make_session(self.mo │
                             │    79 │                                         │
                             │    80 │   def clear_cache(self) -> None:        │
                             │    81 │   │   if not self.cache_dir.exists():   │
                             │                                                 │
                             │ /usr/src/app/models/base.py:108 in              │
                             │ _make_session                                   │
                             │                                                 │
                             │   105 │   │                                     │
                             │   106 │   │   match model_path.suffix:          │
                             │   107 │   │   │   case ".armnn":                │
                             │ ❱ 108 │   │   │   │   session: ModelSession = A │
                             │   109 │   │   │   case ".onnx":                 │
                             │   110 │   │   │   │   session = OrtSession(mode │
                             │   111 │   │   │   case _:                       │
                             │                                                 │
                             │ /usr/src/app/sessions/ann.py:26 in __init__     │
                             │                                                 │
                             │   23 │   │   self.ann = Ann(tuning_level=settin │
                             │      "gpu-tuning.ann").as_posix())              │
                             │   24 │   │                                      │
                             │   25 │   │   log.info("Loading ANN model %s ... │
                             │ ❱ 26 │   │   self.model = self.ann.load(        │
                             │   27 │   │   │   model_path.as_posix(),         │
                             │   28 │   │   │   cached_network_path=model_path │
                             │   29 │   │   │   fp16=settings.ann_fp16_turbo,  │
                             │                                                 │
                             │ /usr/src/ann/ann.py:124 in load                 │
                             │                                                 │
                             │   121 │   │   │   cached_network_path.encode()  │
                             │   122 │   │   )                                 │
                             │   123 │   │   if net_id < 0:                    │
                             │ ❱ 124 │   │   │   raise ValueError("Cannot load │
                             │   125 │   │                                     │
                             │   126 │   │   self.input_shapes[net_id] = tuple │
                             │   127 │   │   │   self.shape(net_id, input=True │
                             │       input=True))                              │
                             ╰─────────────────────────────────────────────────╯
                             ValueError: Cannot load model!

Additional information

RK3588 CPU

bo0tzz commented 2 weeks ago

@mertalev I was under the impression that you were still working on RK3588 support?

mertalev commented 2 weeks ago

RK3588 is already supported for many models, but the siglip models are still WIP and apparently don't work.

jdicioccio commented 2 weeks ago

This used to work.. maybe before it was falling back to running on CPU?

mertalev commented 2 weeks ago

The ARMNN models just didn't exist before so it used CPU, but now they do exist but are broken. You can use the CPU image for machine learning for now to use CPU as before.