immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
50.3k stars 2.67k forks source link

[BUG] Machine learning keeps crashing with 'Worker exited with code 3' #6183

Closed marsara9 closed 9 months ago

marsara9 commented 10 months ago

The bug

Facial recognition was initially working but after uploading all of my initial photos, the machine learning container keeps crashing with the following error:

[01/05/24 04:12:47] ERROR    Exception in worker process                        
                             Traceback (most recent call last):                 
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/gunicorn/ar
                             biter.py", line 609, in spawn_worker               
                                 worker.init_process()                          
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/uvicorn/wor
                             kers.py", line 66, in init_process                 
                                 super(UvicornWorker, self).init_process()      
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/gunicorn/wo
                             rkers/base.py", line 134, in init_process          
                                 self.load_wsgi()                               
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/gunicorn/wo
                             rkers/base.py", line 146, in load_wsgi             
                                 self.wsgi = self.app.wsgi()                    
                                             ^^^^^^^^^^^^^^^                    
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/gunicorn/ap
                             p/base.py", line 67, in wsgi                       
                                 self.callable = self.load()                    
                                                 ^^^^^^^^^^^                    
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/gunicorn/ap
                             p/wsgiapp.py", line 58, in load                    
                                 return self.load_wsgiapp()                     
                                        ^^^^^^^^^^^^^^^^^^^                     
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/gunicorn/ap
                             p/wsgiapp.py", line 48, in load_wsgiapp            
                                 return util.import_app(self.app_uri)           
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^           
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/gunicorn/ut
                             il.py", line 371, in import_app                    
                                 mod = importlib.import_module(module)          
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^          
                               File                                             
                             "/usr/local/lib/python3.11/importlib/__init__.py", 
                             line 126, in import_module                         
                                 return _bootstrap._gcd_import(name[level:],    
                             package, level)                                    
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                             ^^^^^^^^^^^^                                       
                               File "<frozen importlib._bootstrap>", line 1204, 
                             in _gcd_import                                     
                               File "<frozen importlib._bootstrap>", line 1176, 
                             in _find_and_load                                  
                               File "<frozen importlib._bootstrap>", line 1147, 
                             in _find_and_load_unlocked                         
                               File "<frozen importlib._bootstrap>", line 690,  
                             in _load_unlocked                                  
                               File "<frozen importlib._bootstrap_external>",   
                             line 940, in exec_module                           
                               File "<frozen importlib._bootstrap>", line 241,  
                             in _call_with_frames_removed                       
                               File "/usr/src/app/main.py", line 18, in <module>
                                 from app.models.base import InferenceModel     
                               File "/usr/src/app/models/__init__.py", line 8,  
                             in <module>                                        
                                 from .facial_recognition import FaceRecognizer 
                               File "/usr/src/app/models/facial_recognition.py",
                             line 7, in <module>                                
                                 from insightface.model_zoo import ArcFaceONNX, 
                             RetinaFace                                         
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/insightface
                             /__init__.py", line 18, in <module>                
                                 from . import app                              
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/insightface
                             /app/__init__.py", line 2, in <module>             
                                 from .mask_renderer import *                   
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/insightface
                             /app/mask_renderer.py", line 8, in <module>        
                                 from ..thirdparty import face3d                
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/insightface
                             /thirdparty/face3d/__init__.py", line 3, in        
                             <module>                                           
                                 from . import mesh                             
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/insightface
                             /thirdparty/face3d/mesh/__init__.py", line 11, in  
                             <module>                                           
                                 from . import vis                              
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/insightface
                             /thirdparty/face3d/mesh/vis.py", line 6, in        
                             <module>                                           
                                 import matplotlib.pyplot as plt                
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/matplotlib/
                             __init__.py", line 161, in <module>                
                                 from . import _api, _version, cbook,           
                             _docstring, rcsetup                                
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/matplotlib/
                             rcsetup.py", line 25, in <module>                  
                                 from matplotlib import _api, cbook             
                             ImportError: cannot import name 'cbook' from       
                             partially initialized module 'matplotlib' (most    
                             likely due to a circular import)                   
                             (/opt/venv/lib/python3.11/site-packages/matplotlib/
                             __init__.py)                                       
[01/05/24 04:12:47] INFO     Worker exiting (pid: 14)                           
[01/05/24 04:12:48] ERROR    Worker (pid:14) exited with code 3                 
[01/05/24 04:12:48] ERROR    Shutting down: Master                              
[01/05/24 04:12:48] ERROR    Reason: Worker failed to boot.  

The OS that Immich Server is running on

Ubuntu 23.10 (Raspberry Pi 4b 4G)

Version of Immich Server

v1.91.4

Version of Immich Mobile App

N/A

Platform with the issue

Your docker-compose.yml content

version: "3.8"
name: immich

services:
  webui:
    image: nginx:latest
    container_name: webui
    networks:
      - internal
    volumes:
      - /docker/nginx/:/etc/nginx/
      - /docker/certs:/certs
    ports:
      - 80:80
      - 443:443
    depends_on:
      - immich-server
    restart: unless-stopped
  portainer:
    image: portainer/agent:latest
    container_name: portainer
    networks:
      - internal
    ports:
      - 9001:9001
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /var/lib/docker/volumes:/var/lib/docker/volumes
    restart: always
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    networks:
      - internal
    command: [ "start.sh", "immich" ]
    volumes:
      - media:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
    restart: unless-stopped

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    networks:
      - internal
    command: [ "start.sh", "microservices" ]
    volumes:
      - media:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
    restart: unless-stopped

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    networks:
      - internal
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: unless-stopped

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:b6124ab2e45cc332e16398022a411d7e37181f21ff7874835e0180f56a09e82a
    networks:
      - internal
    restart: unless-stopped

  database:
    container_name: immich_postgres
    image: tensorchord/pgvecto-rs:pg14-v0.1.11@sha256:0335a1a22f8c5dd1b697f14f079934f5152eaaa216c09b61e293be285491f8ee
    networks:
      - internal
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: unless-stopped

volumes:
  pgdata:
  model-cache:
  media:
    driver_opts:
      type: cifs
      o: *redacted*
      device: *redacted*
networks:
  internal:
    name: internal
    driver: bridge

Your .env content

IMMICH_VERSION=release

DB_PASSWORD=*redacted*

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

REDIS_HOSTNAME=immich_redis

Reproduction steps

Unknown.  

Additional information

Outside of the initial setup / configuration, nothing has been changed on the server.

The first backup from my phone was successful and the face recognition worked (~280 photos).

I then had a 2nd user begin to upload their photos from their phone and that also appeared to be successful (~200 photos).

I then began to move my photos from Google to Immich (~800 photos) and that's where it appears to have stopped. These photos were uploaded using the CLI tool following most of the discussion here: https://github.com/immich-app/immich/discussions/1340 , with modifications to use the new version of the CLI tool instead.

Trying to run the facial recognition job from the admin page tries to queue up about 500 or so images every time. That number quickly drops to 0, but restarting the job starts back at approximately 500 again. Checking the logs show the original error from above.

marsara9 commented 9 months ago

I ended up rebuilding the server from scratch and I haven't run into the issue since. So closing this assuming it was fixed with the latest update.