immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
43.38k stars 2.12k forks source link

[BUG] Batch import hangs "randomly" #2367

Closed Chacsam closed 1 year ago

Chacsam commented 1 year ago

The bug

I have set up Immich twice on Virtual Containers and got the issue each time after a few thousand succesful import. It Imports and at some point the jobs just hang while the "Metadata" Jobs automatically pauses itself. image

The OS that Immich Server is running on

Ubuntu 22.04 (container), 6 Gb RAM, 4 CPU

Version of Immich Server

v1.54.1

Version of Immich Mobile App

1.54, build 77

Platform with the issue

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:release
    entrypoint: ["/bin/sh", "./start-server.sh"]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:release
    environment:
      - TZ=Europe/Brussels
    entrypoint: ["/bin/sh", "./start-microservices.sh"]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    depends_on:
      - redis
      - database
      - typesense
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:release
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:release
    entrypoint: ["/bin/sh", "./entrypoint.sh"]
    env_file:
      - .env
    restart: always

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.0
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    logging:
      driver: none
    volumes:
      - tsdata:/data
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:release
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    logging:
      driver: none
    depends_on:
      - immich-server
    restart: always

volumes:
  pgdata:
  model-cache:
  tsdata:

Your .env content

###################################################################################
# Database
###################################################################################

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_DATABASE_NAME=immich

# Optional Database settings:
# DB_PORT=5432

###################################################################################
# Redis
###################################################################################

REDIS_HOSTNAME=immich_redis

# REDIS_URL will be used to pass custom options to ioredis.
# Example for Sentinel
# {"sentinels":[{"host":"redis-sentinel-node-0","port":26379},{"host":"redis-sentinel-node-1","port":26379},{"host":"redis-sentinel-node-2","port":26379}],"name":"redis-sentinel"}
# REDIS_URL=ioredis://eyJzZW50aW5lbHMiOlt7Imhvc3QiOiJyZWRpcy1zZW50aW5lbDEiLCJwb3J0IjoyNjM3OX0seyJob3N0IjoicmVkaXMtc2VudGluZWwyIiwicG9ydCI6MjYzNzl9XSwibmFtZSI6Im15bWFzdGVyIn0=

# Optional Redis settings:

# Note: these parameters are not automatically passed to the Redis Container
# to do so, please edit the docker-compose.yml file as well. Redis is not configured
# via environment variables, only redis.conf or the command line

# REDIS_PORT=6379
# REDIS_DBINDEX=0
# REDIS_USERNAME=
# REDIS_PASSWORD=
# REDIS_SOCKET=

###################################################################################
# Upload File Location
#
# This is the location where uploaded files are stored.
###################################################################################

UPLOAD_LOCATION=/mnt/immich

###################################################################################
# Typesense
###################################################################################
TYPESENSE_API_KEY=some-random-text-modified-by-fred
# TYPESENSE_ENABLED=false
# TYPESENSE_URL uses base64 encoding for the nodes json.
# Example JSON that was used:
# [
#      { 'host': 'typesense-1.example.net', 'port': '443', 'protocol': 'https' },
#      { 'host': 'typesense-2.example.net', 'port': '443', 'protocol': 'https' },
#      { 'host': 'typesense-3.example.net', 'port': '443', 'protocol': 'https' },
#  ]
# TYPESENSE_URL=ha://WwogICAgeyAnaG9zdCc6ICd0eXBlc2Vuc2UtMS5leGFtcGxlLm5ldCcsICdwb3J0JzogJzQ0MycsICdwcm90b2NvbCc6ICdodHRwcycgfSwKICAgIHsgJ2hvc3QnOiAndHlwZXNlbnNlLTIuZXhhbXBsZS5uZXQnLCAncG9ydCc6ICc0NDMnLCAncHJvdG9jb2wnOiAnaHR0cHMnIH0sCiAgICB7ICdob3N0JzogJ3R5cGVzZW5zZS0zLmV4YW1wbGUubmV0JywgJ3BvcnQnOiAnNDQzJywgJ3Byb3RvY29sJzogJ2h0dHBzJyB9LApd

###################################################################################
# Reverse Geocoding
#
# Reverse geocoding is done locally which has a small impact on memory usage
# This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable
# This ranges from 0-3 with 3 being the most precise
# 3 - Cities > 500 population: ~200MB RAM
# 2 - Cities > 1000 population: ~150MB RAM
# 1 - Cities > 5000 population: ~80MB RAM
# 0 - Cities > 15000 population: ~40MB RAM
####################################################################################

DISABLE_REVERSE_GEOCODING=false
REVERSE_GEOCODING_PRECISION=3

####################################################################################
# WEB - Optional
#
# Custom message on the login page, should be written in HTML form.
# For example:
# PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>"
####################################################################################

PUBLIC_LOGIN_PAGE_MESSAGE="Photos privées. Bienvenue... Sur invitation ;-)."

####################################################################################
# Alternative Service Addresses - Optional
#
# This is an advanced feature for users who may be running their immich services on different hosts.
# It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers.
# Note: immich-microservices is bound to 3002, but no references are made
####################################################################################

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

####################################################################################
# Alternative API's External Address - Optional
#
# This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery.
# You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash.
# NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api
# Examples: http://localhost:3001, http://immich-api.example.com, etc
####################################################################################

#IMMICH_API_URL_EXTERNAL=http://localhost:3001

Reproduction steps

1. Run Batch Import on host
`sudo docker run -it --rm -v "$(pwd):/import" ghcr.io/immich-app/immich-cli:latest upload --key SEt4JAuvIOKfCBRiaCZCJGeurjNRn2QbHdY --recursive --server http://192.168.11.212:2283/api`
2. Wait

Additional information

Immich-Server Log: (Rebooting didn't help)

[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RoutesResolver] TagController {/tag}: +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/tag, POST} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/tag, GET} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/tag/:id, GET} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/tag/:id, PATCH} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/tag/:id, DELETE} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RoutesResolver] AlbumController {/album}: +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/count-by-user-id, GET} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album, POST} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId/users, PUT} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId/assets, PUT} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId, GET} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId/assets, DELETE} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId, DELETE} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId/user/:userId, DELETE} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId, PATCH} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/:albumId/download, GET} route +0ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [RouterExplorer] Mapped {/album/create-shared-link, POST} route +1ms
[Nest] 1  - 05/02/2023, 6:12:41 AM     LOG [SearchService] Running bootstrap
Request #1683007961473: Request to Node 0 failed due to "undefined Request failed with HTTP code 503 | Server said: Not Ready or Lagging"
Request #1683007961473: Sleeping for 4s and then retrying request...
Request #1683007961473: Request to Node 0 failed due to "undefined Request failed with HTTP code 503 | Server said: Not Ready or Lagging"
Request #1683007961473: Sleeping for 4s and then retrying request...
[Nest] 1  - 05/02/2023, 6:12:49 AM     LOG [TypesenseRepository] Schema up to date: assets/assets-v5
[Nest] 1  - 05/02/2023, 6:12:49 AM     LOG [TypesenseRepository] Schema up to date: albums/albums-v1
[Nest] 1  - 05/02/2023, 6:12:49 AM     LOG [TypesenseRepository] Alias mapping: [{"collection_name":"albums-v1","name":"albums"},{"collection_name":"assets-v5","name":"assets"}]
[Nest] 1  - 05/02/2023, 6:12:49 AM     LOG [TypesenseRepository] Collections needing migration: {"assets":false,"albums":false}
[Nest] 1  - 05/02/2023, 6:12:49 AM     LOG [NestApplication] Nest application successfully started +7ms
[Nest] 1  - 05/02/2023, 6:12:49 AM     LOG [ImmichServer] Running Immich Server in PRODUCTION environment - version 1.54.1 - Listening on port: 3001
[Nest] 1  - 05/02/2023, 6:12:49 AM    WARN [ImmichServer] Machine learning is enabled
[Nest] 1  - 05/02/2023, 6:12:49 AM    WARN [ImmichServer] Search is enabled
sakowicz commented 1 year ago

I've got a same problem. More logs:

[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [NestFactory] Starting Nest application...
[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized +107ms
[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [InstanceLoader] BullModule dependencies initialized +1ms
[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [InstanceLoader] ConfigHostModule dependencies initialized +2ms
[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [InstanceLoader] DiscoveryModule dependencies initialized +1ms
[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [InstanceLoader] ConfigModule dependencies initialized +15ms
[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [InstanceLoader] BullModule dependencies initialized +1ms
[Nest] 1  - 05/04/2023, 12:12:13 PM     LOG [InstanceLoader] BullModule dependencies initialized +1ms
[Nest] 1  - 05/04/2023, 12:12:14 PM     LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized +316ms
[Nest] 1  - 05/04/2023, 12:12:14 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized +1ms
[Nest] 1  - 05/04/2023, 12:12:14 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized +1ms
[Nest] 1  - 05/04/2023, 12:12:14 PM    WARN [MetadataExtractionProcessor] Reverse geocoding is enabled
[Nest] 1  - 05/04/2023, 12:12:14 PM     LOG [MetadataExtractionProcessor] Initializing Reverse Geocoding
/usr/src/app/node_modules/local-reverse-geocoder/index.js:746
            throw err;
            ^
CsvError: Invalid Record Length: expect 19, got 16 on line 133063
    at Object.__onRecord (/usr/src/app/node_modules/csv-parse/dist/cjs/index.cjs:940:11)
    at Object.parse (/usr/src/app/node_modules/csv-parse/dist/cjs/index.cjs:903:36)
    at Parser._flush (/usr/src/app/node_modules/csv-parse/dist/cjs/index.cjs:1336:26)
    at Parser.final [as _final] (node:internal/streams/transform:112:25)
    at callFinal (node:internal/streams/writable:694:27)
    at prefinish (node:internal/streams/writable:723:7)
    at finishMaybe (node:internal/streams/writable:733:5)
    at afterWrite (node:internal/streams/writable:504:3)
    at onwrite (node:internal/streams/writable:477:7)
    at Parser.Transform._read (node:internal/streams/transform:245:5) {
  code: 'CSV_RECORD_INCONSISTENT_FIELDS_LENGTH',
  bytes: 22581640,
  comment_lines: 0,
  empty_lines: 0,
  invalid_field_length: 0,
  lines: 133063,
  records: 133062,
  columns: false,
  error: undefined,
  header: false,

  index: 16,
  raw: undefined,
  column: 16,
  quoting: false,
  record: [
    '5145072',
    'Woodstock',
    'Woodstock',
    'Vudstok,uddosutokku,udeuseutog,wwdastak  nywywrk,Вудсток,ووداستاک، نیویورک,ウッドストック,우드스톡',
    '42.04092',
    '-74.1182',
    'P',
    'PPL',

,
    '',

,
    '111',

    '83052',
    '',
    '2088',
    ''
  ]
}

There is an unhandled csv exception that keeps restarting Next.js in the microservices container.

If @Chacsam would provide that container logs, I think it should be something similar.

alextran1502 commented 1 year ago

This is an issue with the upstream local reverse geocoding library. The current work around is to shutdown all the containers, remove the microservices container and then bring up the whole stack again to recreate the microservices container

sakowicz commented 1 year ago

@alextran1502 That is true. I'm restarting the entire docker-compose stack, and it is back to work, but I need to pause the EXTRACT METADATA job because it is the same after a few more processed messages. I'm migrating ~100k files, so there are a lot of problematic csvs..

alextran1502 commented 1 year ago

Yeah we want to fork the upstream library and rewrite it to handle this situation