benwoo1110 commented 1 month ago

The bug

I switched from the default ViT-B-32__openai model to the new ViT-B-16-SigLIP__webli and re-ran all the smart search. I noticed the results when search string has uppercase letters will return completely wrong results, as can be seen in the screenshot below:

6197393944636145716_121

6197393944636145715_121

The OS that Immich Server is running on

Raspberry Pi OS Lite (Bookworm)

Version of Immich Server

v1.112.1

Version of Immich Mobile App

v1.112.1

Platform with the issue

[ ] Server
[ ] Web
[ ] Mobile

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:release
    environment:
      DB_HOSTNAME: ${DB_HOSTNAME}
      DB_PASSWORD: ${DB_PASSWORD}
      DB_USERNAME: ${DB_USERNAME}
      DB_DATABASE_NAME: ${DB_DATABASE_NAME}
      REDIS_HOSTNAME: ${REDIS_HOSTNAME}
    volumes:
      - /srv/dev-disk-by-uuid-460edec2-7136-4c17-b8c5-811412cb11ae/immich:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - /srv/dev-disk-by-uuid-460edec2-7136-4c17-b8c5-811412cb11ae:/usr/src/app/external
    ports:
      - 2283:3001
    dns:
      - 192.168.1.23
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false
    deploy:
      resources:
        limits:
          cpus: '3.00'

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:release
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    environment:
      DB_HOSTNAME: ${DB_HOSTNAME}
      DB_PASSWORD: ${DB_PASSWORD}
      DB_USERNAME: ${DB_USERNAME}
      DB_DATABASE_NAME: ${DB_DATABASE_NAME}
      REDIS_HOSTNAME: ${REDIS_HOSTNAME}
    volumes:
      - /opt/immich/model-cache:/cache
    dns:
      - 192.168.1.23
    restart: always
    healthcheck:
      disable: false
    deploy:
      resources:
        limits:
          cpus: '1.50'

  redis:
    container_name: immich_redis
    image: registry.hub.docker.com/library/redis:6.2-alpine@sha256:51d6c56749a4243096327e3fb964a48ed92254357108449cb6e23999c37773c5
    dns:
      - 192.168.1.23
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
    volumes:
      - /opt/immich/pgdata:/var/lib/postgresql/data
    dns:
      - 192.168.1.23
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 10m
      start_interval: 1m
      start_period: 10m
    command: ["postgres", "-c", "shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=1GB", "-c", "shared_buffers=256MB", "-c", "wal_compression=off"]
    restart: always

Your .env content

DB_PASSWORD=<REDACTED>
DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_DATABASE_NAME=immich
REDIS_HOSTNAME=immich_redis

Reproduction steps

1. Change the smart search model used from `ViT-B-32__openai` to `ViT-B-16-SigLIP__webli` in the web UI settings page.
2. Re-run all for smart search.
3. Tested search with various search strings with/without capitalisation.

Relevant log output

No response

Additional information

No response

JordyEGNL commented 1 month ago

Can also confirm that this is an issue on the web version.

afbeelding

alextran1502 commented 1 month ago

interesting! @mertalev I assume the fix for this is to make all search queries as lower cases?

mertalev commented 1 month ago

No, this should be handled in the tokenizer used in machine learning. Different models want different inputs.