Failed to Smart Search - Githubissues

The bug

When doing a text search with Immich 1.117.0, I'm getting errors loading the model. I tried removing the model cache volume, but it just downloads the non-functioning model again.

The OS that Immich Server is running on

Debian bookworm

Version of Immich Server

v1.117.0

Version of Immich Mobile App

v1.117.0

Platform with the issue

[X] Server
[X] Web
[X] Mobile

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/hardware-transcoding
      file: hwaccel.transcoding.yml
      service: rkmpp # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /data/photoprism/photos:/photoprism/photos:ro
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-armnn
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: armnn # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:e3b17ba9479deec4b7d1eeec1548a253acc5374d68d3b27937fcfe4df8d18c7e
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    restart: always
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=/data/immich/library
# The location where your database files are stored
DB_DATA_LOCATION=/data/immich/db

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=...

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

Open mobile app or web app
Perform a text search

Relevant log output

[10/05/24 01:06:28] INFO     Downloading textual model
                             'ViT-B-16-SigLIP-384__webli'. This may take a
                             while.
Fetching 11 files: 100%|██████████| 11/11 [00:20<00:00,  1.83s/it]
[10/05/24 01:06:49] INFO     Loading textual model 'ViT-B-16-SigLIP-384__webli'
                             to memory
arm_release_ver: g13p0-01eac0, rk_so_ver: 10
[10/05/24 01:06:49] INFO     Loading ANN model
                             /cache/clip/ViT-B-16-SigLIP-384__webli/textual/mode
                             l.armnn ...
Warning: WARNING: Layer of type Cast is not supported on requested backend GpuAcc for input data type Signed32 and output data type Signed64 (reason: in validate_arguments src/gpu/cl/kernels/ClCastKernel.cpp:59: ITensor data type S64 not supported by this kernel), falling back to the next backend.
Warning: ERROR: Layer of type Cast is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Gather is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/core/CL/kernels/CLGatherKernel.cpp:58: ITensor data type S64 not supported by this kernel), falling back to the next backend.
Warning: ERROR: Layer of type Gather is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
[10/05/24 01:06:50] ERROR    Exception in ASGI application

                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:152 in predict             │
                             │                                                 │
                             │   149 │   │   inputs = text                     │
                             │   150 │   else:                                 │
                             │   151 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 152 │   response = await run_inference(inputs │
                             │   153 │   return ORJSONResponse(response)       │
                             │   154                                           │
                             │   155                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:175 in run_inference       │
                             │                                                 │
                             │   172 │   │   response[entry["task"]] = output  │
                             │   173 │                                         │
                             │   174 │   without_deps, with_deps = entries     │
                             │ ❱ 175 │   await asyncio.gather(*[_run_inference │
                             │   176 │   if with_deps:                         │
                             │   177 │   │   await asyncio.gather(*[_run_infer │
                             │   178 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/app/main.py:169 in _run_inference      │
                             │                                                 │
                             │   166 │   │   │   except KeyError:              │
                             │   167 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   168 │   │   │   │   raise HTTPException(400,  │
                             │ ❱ 169 │   │   model = await load(model)         │
                             │   170 │   │   output = await run(model.predict, │
                             │   171 │   │   outputs[model.identity] = output  │
                             │   172 │   │   response[entry["task"]] = output  │
                             │                                                 │
                             │ /usr/src/app/main.py:213 in load                │
                             │                                                 │
                             │   210 │   │   return model                      │
                             │   211 │                                         │
                             │   212 │   try:                                  │
                             │ ❱ 213 │   │   return await run(_load, model)    │
                             │   214 │   except (OSError, InvalidProtobuf, Bad │
                             │   215 │   │   log.warning(f"Failed to load {mod │
                             │       '{model.model_name}'. Clearing cache.")   │
                             │   216 │   │   model.clear_cache()               │
                             │                                                 │
                             │ /usr/src/app/main.py:188 in run                 │
                             │                                                 │
                             │   185 │   if thread_pool is None:               │
                             │   186 │   │   return func(*args, **kwargs)      │
                             │   187 │   partial_func = partial(func, *args, * │
                             │ ❱ 188 │   return await asyncio.get_running_loop │
                             │   189                                           │
                             │   190                                           │
                             │   191 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/local/lib/python3.11/concurrent/futures/th │
                             │ read.py:58 in run                               │
                             │                                                 │
                             │ /usr/src/app/main.py:200 in _load               │
                             │                                                 │
                             │   197 │   │   │   raise HTTPException(500, f"Fa │
                             │   198 │   │   with lock:                        │
                             │   199 │   │   │   try:                          │
                             │ ❱ 200 │   │   │   │   model.load()              │
                             │   201 │   │   │   except FileNotFoundError as e │
                             │   202 │   │   │   │   if model.model_format ==  │
                             │   203 │   │   │   │   │   raise e               │
                             │                                                 │
                             │ /usr/src/app/models/base.py:53 in load          │
                             │                                                 │
                             │    50 │   │   self.download()                   │
                             │    51 │   │   attempt = f"Attempt #{self.load_a │
                             │       else "Loading"                            │
                             │    52 │   │   log.info(f"{attempt} {self.model_ │
                             │       '{self.model_name}' to memory")           │
                             │ ❱  53 │   │   self.session = self._load()       │
                             │    54 │   │   self.loaded = True                │
                             │    55 │                                         │
                             │    56 │   def predict(self, *inputs: Any, **mod │
                             │                                                 │
                             │ /usr/src/app/models/clip/textual.py:26 in _load │
                             │                                                 │
                             │    23 │   │   return res                        │
                             │    24 │                                         │
                             │    25 │   def _load(self) -> ModelSession:      │
                             │ ❱  26 │   │   session = super()._load()         │
                             │    27 │   │   log.debug(f"Loading tokenizer for │
                             │    28 │   │   self.tokenizer = self._load_token │
                             │    29 │   │   tokenizer_kwargs: dict[str, Any]  │
                             │                                                 │
                             │ /usr/src/app/models/base.py:78 in _load         │
                             │                                                 │
                             │    75 │   │   )                                 │
                             │    76 │                                         │
                             │    77 │   def _load(self) -> ModelSession:      │
                             │ ❱  78 │   │   return self._make_session(self.mo │
                             │    79 │                                         │
                             │    80 │   def clear_cache(self) -> None:        │
                             │    81 │   │   if not self.cache_dir.exists():   │
                             │                                                 │
                             │ /usr/src/app/models/base.py:108 in              │
                             │ _make_session                                   │
                             │                                                 │
                             │   105 │   │                                     │
                             │   106 │   │   match model_path.suffix:          │
                             │   107 │   │   │   case ".armnn":                │
                             │ ❱ 108 │   │   │   │   session: ModelSession = A │
                             │   109 │   │   │   case ".onnx":                 │
                             │   110 │   │   │   │   session = OrtSession(mode │
                             │   111 │   │   │   case _:                       │
                             │                                                 │
                             │ /usr/src/app/sessions/ann.py:26 in __init__     │
                             │                                                 │
                             │   23 │   │   self.ann = Ann(tuning_level=settin │
                             │      "gpu-tuning.ann").as_posix())              │
                             │   24 │   │                                      │
                             │   25 │   │   log.info("Loading ANN model %s ... │
                             │ ❱ 26 │   │   self.model = self.ann.load(        │
                             │   27 │   │   │   model_path.as_posix(),         │
                             │   28 │   │   │   cached_network_path=model_path │
                             │   29 │   │   │   fp16=settings.ann_fp16_turbo,  │
                             │                                                 │
                             │ /usr/src/ann/ann.py:124 in load                 │
                             │                                                 │
                             │   121 │   │   │   cached_network_path.encode()  │
                             │   122 │   │   )                                 │
                             │   123 │   │   if net_id < 0:                    │
                             │ ❱ 124 │   │   │   raise ValueError("Cannot load │
                             │   125 │   │                                     │
                             │   126 │   │   self.input_shapes[net_id] = tuple │
                             │   127 │   │   │   self.shape(net_id, input=True │
                             │       input=True))                              │
                             ╰─────────────────────────────────────────────────╯
                             ValueError: Cannot load model!

Additional information

RK3588 CPU

immich-app / immich

Failed to Smart Search #13198