immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
46.35k stars 2.3k forks source link

Facial recognition fails when Minimum detection score is zero #11596

Open CubeOvO opened 1 month ago

CubeOvO commented 1 month ago

The bug

the title, took a while to find out the cause, but when the "Minimum detection score" in facial recoginition is set to 0, the facial recoginition will fail (for all models). with or without the cuda acceleration, log output at the end different config have different errors, the log is produced with antelopev2

The OS that Immich Server is running on

windows

Version of Immich Server

v1.111.0

Version of Immich Mobile App

v1.0.0

Platform with the issue

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - [redacted some external mounts]

    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cuda # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:e3b17ba9479deec4b7d1eeec1548a253acc5374d68d3b27937fcfe4df8d18c7e
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=./library
# The location where your database files are stored
DB_DATA_LOCATION=./postgres

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD= [redacted]

MACHINE_LEARNING_WORKER_TIMEOUT = 300
# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

1. lauch immich server
2. go to admin - system setting - facial recognision and change the min dectection score to 0
3. queue some facial recognition job
...

Relevant log output

2024-08-05 13:02:41 [08/05/24 17:02:41] INFO     Starting gunicorn 22.0.0                           
2024-08-05 13:02:41 [08/05/24 17:02:41] INFO     Listening at: http://[::]:3003 (9)                 
2024-08-05 13:02:41 [08/05/24 17:02:41] INFO     Using worker: app.config.CustomUvicornWorker       
2024-08-05 13:02:41 [08/05/24 17:02:41] INFO     Booting worker with pid: 10                        
2024-08-05 13:02:46 [08/05/24 17:02:46] INFO     Started server process [10]                        
2024-08-05 13:02:46 [08/05/24 17:02:46] INFO     Waiting for application startup.                   
2024-08-05 13:02:46 [08/05/24 17:02:46] INFO     Created in-memory cache with unloading after 300s  
2024-08-05 13:02:46                              of inactivity.                                     
2024-08-05 13:02:46 [08/05/24 17:02:46] INFO     Initialized request thread pool with 12 threads.   
2024-08-05 13:02:46 [08/05/24 17:02:46] INFO     Application startup complete.                      
2024-08-05 13:02:50 [08/05/24 17:02:50] INFO     Loading detection model 'antelopev2' to memory     
2024-08-05 13:02:50 [08/05/24 17:02:50] INFO     Setting execution providers to                     
2024-08-05 13:02:50                              ['CPUExecutionProvider'], in descending order of   
2024-08-05 13:02:50                              preference                                         
2024-08-05 13:02:51 [08/05/24 17:02:51] INFO     Loading recognition model 'antelopev2' to memory   
2024-08-05 13:02:51 [08/05/24 17:02:51] INFO     Setting execution providers to                     
2024-08-05 13:02:51                              ['CPUExecutionProvider'], in descending order of   
2024-08-05 13:02:51                              preference      
2024-08-05 13:03:07 [Nest] 7  - 08/05/2024, 5:03:07 PM   ERROR [Microservices:JobService] Unable to run job handler (faceDetection/face-detection): Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
2024-08-05 13:03:07 [Nest] 7  - 08/05/2024, 5:03:07 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with SocketError: other side closed
2024-08-05 13:03:07     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
2024-08-05 13:03:07     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
2024-08-05 13:03:07     at async MachineLearningRepository.detectFaces (/usr/src/app/dist/repositories/machine-learning.repository.js:33:26)
2024-08-05 13:03:07     at async PersonService.handleDetectFaces (/usr/src/app/dist/services/person.service.js:284:52)
2024-08-05 13:03:07     at async /usr/src/app/dist/services/job.service.js:148:36
2024-08-05 13:03:07     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
2024-08-05 13:03:07     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
2024-08-05 13:03:07 [Nest] 7  - 08/05/2024, 5:03:07 PM   ERROR [Microservices:JobService] Object:
2024-08-05 13:03:07 {
2024-08-05 13:03:07   "id": "80c1b1fd-b80b-48fc-8713-c398a35643d6"
2024-08-05 13:03:07 }
2024-08-05 13:03:07                                   
2024-08-05 13:03:07 [08/05/24 17:03:07] ERROR    Worker (pid:10) was sent SIGKILL! Perhaps out of   
2024-08-05 13:03:07                              memory?                                            
2024-08-05 13:03:07 [08/05/24 17:03:07] INFO     Booting worker with pid: 52                        
2024-08-05 13:03:14 [08/05/24 17:03:14] INFO     Started server process [52]                        
2024-08-05 13:03:14 [08/05/24 17:03:14] INFO     Waiting for application startup.
(and repeat)

Additional information

No response

danieldietzler commented 5 days ago

Why is this a bug? A minimum detection score of zero does not make any sense