immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
43.69k stars 2.13k forks source link

JS heap out of memory - library job #11168

Open aks-cadesign opened 1 month ago

aks-cadesign commented 1 month ago

The bug

library job failed after adding external library

The OS that Immich Server is running on

Ubuntu 24.04 LTS

Version of Immich Server

v1.108.0

Version of Immich Mobile App

-

Platform with the issue

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    #extends:
      #file: hwaccel.transcoding.yml
      #service: quicksync # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - ${EXTERNAL_PATH_CLIENTS}:/usr/src/app/external:ro
    env_file:
      - stack.env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - stack.env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:328fe6a5822256d065debb36617a8169dbfbd77b797c525288e465f56c1d392b
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

UPLOAD_LOCATION=./library
DB_DATA_LOCATION=./postgres
IMMICH_VERSION=release
DB_PASSWORD=password
DB_USERNAME=postgres
DB_DATABASE_NAME=immich
EXTERNAL_PATH_CLIENTS=/mnt/media
NODE_OPTIONS=--max-old-space-size=8196

Reproduction steps

1. followed the guide to add external library: https://immich.app/docs/guides/external-library/
2. library job failed

Relevant log output

[Nest] 7  - 07/17/2024, 12:36:34 PM     LOG [Microservices:LibraryService] Refreshing library: 3cb44023-78b5-4ee1-97eb-e77ae17a6955
[Nest] 17  - 07/17/2024, 12:39:01 PM     LOG [Api:AuditService~ydncl54g] Found 0 original files, 0 thumbnails, 0 encoded videos, 0 profile files
[Nest] 17  - 07/17/2024, 12:39:01 PM     LOG [Api:AuditService~ydncl54g] Found 0 assets, 1 users, 0 people
node:events:497
      throw er; // Unhandled 'error' event
      ^
Error [ERR_WORKER_OUT_OF_MEMORY]: Worker terminated due to reaching memory limit: JS heap out of memory
    at [kOnExit] (node:internal/worker:313:26)
    at Worker.<computed>.onexit (node:internal/worker:229:20)
Emitted 'error' event on Worker instance at:
    at [kOnExit] (node:internal/worker:313:12)
    at Worker.<computed>.onexit (node:internal/worker:229:20) {
  code: 'ERR_WORKER_OUT_OF_MEMORY'
}

Additional information

at first, it crashed around 4.3GB RAM usage, so I added 'NODE_OPTIONS=--max-old-space-size=8196' env variable but I get the same outcome with increased memory usage (9.4GB).

alextran1502 commented 1 month ago

Any thoughts on this @zackpollard?

zackpollard commented 1 month ago

How big is your external library? Count of assets is the main thing

aks-cadesign commented 1 month ago

19.903 files, 1.450 folders, 145GB

takes around 6 minutes until it crashes:

image

m4ntic0r commented 1 month ago

not a lot of files and folders for this error. i can only share my experiences. at the moment i have 10 libraries (~600 folders each) and 1.35 million assets. on a system with 8gb ram i had problem if a single library gets around 250k+ assets. so be sure you are at ~200k and limit concurrent worker threads for library to 1! i then had never the "JS heap out of memory" again. i have no special variables @.env etc.. only default stuff.

because your numbers are much smaller than that, i dont have a clue what the problem is. but try concurrent librarys workers set to 1. i dont know why default setting is 5. in my opinion its too high, because a lot of memory stuff is not fixed at the moment. and even for my over millions assets a setting of 1 is enough.

aks-cadesign commented 1 month ago

I moved to a different machine with more RAM and changed NODE_OPTIONS environment variable to: --max-old-space-size=49152 and got a new error this time

node:internal/event_target:1094 process.nextTick(() => { throw err; }); ^ RangeError [Error]: Map maximum size exceeded at Map.set () at EntryFilter._createIndexRecord (/usr/src/app/node_modules/fast-glob/out/providers/filters/entry.js:37:20) at EntryFilter._filter (/usr/src/app/node_modules/fast-glob/out/providers/filters/entry.js:29:18) at /usr/src/app/node_modules/fast-glob/out/providers/filters/entry.js:13:32 at Object.isAppliedFilter (/usr/src/app/node_modules/@nodelib/fs.walk/out/readers/common.js:12:31) at AsyncReader._handleEntry (/usr/src/app/node_modules/@nodelib/fs.walk/out/readers/async.js:86:20) at /usr/src/app/node_modules/@nodelib/fs.walk/out/readers/async.js:65:22 at callSuccessCallback (/usr/src/app/node_modules/@nodelib/fs.scandir/out/providers/async.js:103:5) at /usr/src/app/node_modules/@nodelib/fs.scandir/out/providers/async.js:38:13 at end (/usr/src/app/node_modules/run-parallel/index.js:21:15) Emitted 'error' event on Worker instance at: at [kOnErrorMessage] (node:internal/worker:326:10) at [kOnMessage] (node:internal/worker:337:37) at MessagePort. (node:internal/worker:232:57) at [nodejs.internal.kHybridDispatch] (node:internal/event_target:820:20) at MessagePort. (node:internal/per_context/messageport:23:28)

now running version v1.111.0 RAM usage peaked around 42GB.

concurrent library workers has been set to 1 even before opening this issue.

another thing I noticed: nothing is being added to generate thumbnails or other jobs and it's just the 1 library job running.

zackpollard commented 1 month ago

Could you please run the following command in the folder your library is in on the host machine and post the result here or DM it to me on discord?

ls -lR /path/to/folder | grep '^l'
kdybicz commented 21 hours ago

I'm facing the same issue with an immich-server limited to 3GB and with the external library with around 150k assets :/

mertalev commented 12 hours ago

@kdybicz Are you on 1.113.0 or later? That release lowered RAM usage when scanning large libraries.

kdybicz commented 7 hours ago

@mertalev I've tested it yesterday with 1.113.1