immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
49.26k stars 2.59k forks source link

Out of memory when scanning libraries with >1M assets #7373

Closed pltaylor closed 7 months ago

pltaylor commented 8 months ago

The bug

I have added an external library of my photos in the UI and then run a 'Scan New Library Files' command. It runs for a couple minutes, as verified in the jobs menu, and then stops. The logs show the following

[PRODUCTION]

<--- Last few GCs --->

[395:0x4d186320000] 155265 ms: Scavenge 4015.8 (4137.3) -> 4012.8 (4137.3) MB, 15.72 / 0.00 ms (average mu = 0.490, current mu = 0.542) allocation failure; [395:0x4d186320000] 155298 ms: Scavenge 4016.5 (4137.3) -> 4013.4 (4137.3) MB, 14.14 / 0.00 ms (average mu = 0.490, current mu = 0.542) allocation failure; [395:0x4d186320000] 155337 ms: Scavenge 4017.2 (4137.3) -> 4014.0 (4145.3) MB, 17.19 / 0.00 ms (average mu = 0.490, current mu = 0.542) allocation failure;

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a8c86 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [immich_microservices] 10: 0x1503a16 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long, v8::internal::Isolate) [immich_microservices] 11: 0x150ffea99ef6

The OS that Immich Server is running on

Docker on unraid

Version of Immich Server

v1.95.1

Version of Immich Mobile App

N/A

Platform with the issue

Your docker-compose.yml content

Used the Unraid app along with a redis install and a postgres install.  No errors in logs to indicate nothing is set up incorrectly prior to importing the library.

Your .env content

-e TZ="America/New_York"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="*******"
  -e HOST_CONTAINERNAME="immich"
  -e 'DB_HOSTNAME'='*****'
  -e 'DB_USERNAME'='******'
  -e 'DB_PASSWORD'='******'
  -e 'DB_DATABASE_NAME'='immich'
  -e 'REDIS_HOSTNAME'='******'
  -e 'DB_PORT'='5433'
  -e 'REDIS_PORT'='6379'
  -e 'REDIS_PASSWORD'=''
  -e 'MACHINE_LEARNING_GPU_ACCELERATION'=''
  -e 'MACHINE_LEARNING_WORKERS'='1'
  -e 'MACHINE_LEARNING_WORKER_TIMEOUT'='120'
  -e 'PUID'='99'
  -e 'PGID'='100'
  -e 'UMASK'='022'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:8080]'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/imagegenius/templates/main/unraid/img/immich.png'
  -p '8080:8080/tcp'
  -v '/mnt/user/Photos/':'/photos':'rw'
  -v '/mnt/user/Photos/ImmichUpload':'/import':'ro'
  -v '/mnt/user/appdata/immich':'/config':'rw' 'ghcr.io/imagegenius/immich:latest'
8c3dd13fba63d182b6a847a9989510234fa8838c5c953d708129967aebd30fb0

Reproduction steps

1. Fresh Install using Unraid
2. Create Admin user
3. Add External Library
4. Scan New Library
5. Error

Additional information

No response

alextran1502 commented 8 months ago

How much RAM do you have on this machine?

pltaylor commented 8 months ago

32 GB total on the machine. It is sharing with several other docker containers, but doesn't look constrained at all. The import I'm trying to do is a lot of photos.

pltaylor commented 8 months ago

When I import from a much smaller sub-folder it looks like it is importing.

alextran1502 commented 8 months ago

cc @etnoy. Do we have any operation that kind of loads everything into the memory?

etnoy commented 8 months ago

How many files are there in the import paths?

Yes, we do crawl and save the list of paths to be imported into an array, but maybe we need to see if this can be optimized somehow.

What if you decrease the library job concurrency to 1?

etnoy commented 8 months ago

If you run with debug logging enabled, can you post what the logs say about the library scan?

acbaldwi commented 7 months ago

Just to chime in here I see similar results with a large amount of photos to be imported. approximately 1.3m items 1.2m are phtoos. The server in my case has 96gb of ram, I have a second immich unraid instance on another server that was able to parse all of the photos, this 96gb server seems to only be able to se 1/2 of the available photos.

I set concurrancy of library tasks to 1 as you suggetsed and debug on logs, Below are my logs hopefully it can help both of us.

`[Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [CommunicationRepository] Initialized websocket server [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [SystemConfigService] LogLevel=debug (set via system config) [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering thumbnailGeneration with a concurrency of 5 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering metadataExtraction with a concurrency of 5 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering videoConversion with a concurrency of 1 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering faceDetection with a concurrency of 2 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering facialRecognition with a concurrency of 1 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering smartSearch with a concurrency of 2 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering backgroundTask with a concurrency of 5 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering storageTemplateMigration with a concurrency of 1 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering migration with a concurrency of 5 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering search with a concurrency of 5 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering sidecar with a concurrency of 5 [Nest] 11380 - 03/03/2024, 3:43:50 PM DEBUG [JobService] Registering library with a concurrency of 1 [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [MetadataRepository] Initializing metadata repository [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [MetadataService] Initialized local reverse geocoder [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [NestFactory] Starting Nest application... [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] ConfigHostModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] DiscoveryModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] ScheduleModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] ConfigModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] InfraModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] DomainModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [InstanceLoader] MicroservicesModule dependencies initialized [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [NestApplication] Nest application successfully started [Nest] 11380 - 03/03/2024, 3:43:50 PM LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.97.0] [PRODUCTION] [Nest] 11380 - 03/03/2024, 3:44:27 PM DEBUG [LibraryService] Found 681410 asset(s) when crawling import paths /import [Nest] 11380 - 03/03/2024, 3:44:54 PM DEBUG [LibraryService] Marking 0 assets as offline

<--- Last few GCs --->

[11380:0x2024a320000] 96142 ms: Mark-Compact 4016.2 (4134.4) -> 4000.5 (4134.7) MB, 1133.66 / 0.00 ms (average mu = 0.138, current mu = 0.061) allocation failure; scavenge might not succeed [11380:0x2024a320000] 97393 ms: Mark-Compact 4018.5 (4136.8) -> 4002.2 (4136.5) MB, 1177.41 / 0.00 ms (average mu = 0.099, current mu = 0.059) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10dca94 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [immich_microservices] 7: 0x10f3984 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const) [immich_microservices] 8: 0x10f419c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 9: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 10: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 11: 0x10a8c86 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [immich_microservices] 12: 0x1503a16 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long, v8::internal::Isolate*) [immich_microservices] 13: 0x150b20699ef6 [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [CommunicationRepository] Initialized websocket server [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [SystemConfigService] LogLevel=debug (set via system config) [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering thumbnailGeneration with a concurrency of 5 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering metadataExtraction with a concurrency of 5 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering videoConversion with a concurrency of 1 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering faceDetection with a concurrency of 2 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering facialRecognition with a concurrency of 1 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering smartSearch with a concurrency of 2 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering backgroundTask with a concurrency of 5 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering storageTemplateMigration with a concurrency of 1 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering migration with a concurrency of 5 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering search with a concurrency of 5 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering sidecar with a concurrency of 5 [Nest] 11398 - 03/03/2024, 3:45:31 PM DEBUG [JobService] Registering library with a concurrency of 1 [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [MetadataRepository] Initializing metadata repository [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [MetadataService] Initialized local reverse geocoder [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [NestFactory] Starting Nest application... [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] ConfigHostModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] DiscoveryModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] ScheduleModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] ConfigModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] InfraModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] DomainModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [InstanceLoader] MicroservicesModule dependencies initialized [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [NestApplication] Nest application successfully started [Nest] 11398 - 03/03/2024, 3:45:31 PM LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.97.0] [PRODUCTION] `

etnoy commented 7 months ago

I know I have a huge photo collection, but you're stomping all over it. 1.3M is amazing and I think we have to go to the drawing board a bit to get collections that large imported.

acbaldwi commented 7 months ago

I know I have a huge photo collection, but you're stomping all over it. 1.3M is amazing and I think we have to go to the drawing board a bit to get collections that large imported.

lol wife was a photographer before she semi retired, that being said server #2 which is annoying ly my backup server was able to parse all of them easy peasy. it has 128gb of ram if that makes any difference.

etnoy commented 7 months ago

I know I have a huge photo collection, but you're stomping all over it. 1.3M is amazing and I think we have to go to the drawing board a bit to get collections that large imported.

lol wife was a photographer before she semi retired, that being said server #2 which is annoying ly my backup server was able to parse all of them easy peasy. it has 128gb of ram if that makes any difference.

It's a great data point for us. 1.3M assets can be imported with 128G but not 96G. We haven't tested anything close to that.

Good news is that it seems the crawling step is what actually eats the memory. This is due to us crawling the entire thing at once and storing it in a list. What this means is if we optimize that step some more the subsequent steps (actual import) should work nicely.

etnoy commented 7 months ago

I know I have a huge photo collection, but you're stomping all over it. 1.3M is amazing and I think we have to go to the drawing board a bit to get collections that large imported.

lol wife was a photographer before she semi retired, that being said server #2 which is annoying ly my backup server was able to parse all of them easy peasy. it has 128gb of ram if that makes any difference.

A possible optimization is to reduce the path length of the external assets. The volume mount in docker, make sure the path Immich sees is as short as possible. As in, instead of mounting the photos to /this/very/long/path, try a single letter character such as /a.

acbaldwi commented 7 months ago

I know I have a huge photo collection, but you're stomping all over it. 1.3M is amazing and I think we have to go to the drawing board a bit to get collections that large imported.

lol wife was a photographer before she semi retired, that being said server #2 which is annoying ly my backup server was able to parse all of them easy peasy. it has 128gb of ram if that makes any difference.

A possible optimization is to reduce the path length of the external assets. The volume mount in docker, make sure the path Immich sees is as short as possible. As in, instead of mounting the photos to /this/very/long/path, try a single letter character such as /a.

Thansk for the suggestion

Just to continue on this thread what I did on the server that worked other than ram was to import photos with the subfolders individually, on the non working unit (the one with the java heap memory issue) i had said just import the entire photos structure as one main folder (ex /import).

Whats ive done is to go back and wipe that installation completely and start fresh and told it to do the subfolders within her photo share so /import/2007 , import/2008 etc etc etc. well see if it giets all the photos this time on 96gb of ram.

etnoy commented 7 months ago

I know I have a huge photo collection, but you're stomping all over it. 1.3M is amazing and I think we have to go to the drawing board a bit to get collections that large imported.

lol wife was a photographer before she semi retired, that being said server #2 which is annoying ly my backup server was able to parse all of them easy peasy. it has 128gb of ram if that makes any difference.

A possible optimization is to reduce the path length of the external assets. The volume mount in docker, make sure the path Immich sees is as short as possible. As in, instead of mounting the photos to /this/very/long/path, try a single letter character such as /a.

Thansk for the suggestion

Just to continue on this thread what I did on the server that worked other than ram was to import photos with the subfolders individually, on the non working unit (the one with the java heap memory issue) i had said just import the entire photos structure as one main folder (ex /import).

Whats ive done is to go back and wipe that installation completely and start fresh and told it to do the subfolders within her photo share so /import/2007 , import/2008 etc etc etc. well see if it giets all the photos this time on 96gb of ram.

Do you have the option to try the latest git version of Immich and see if it makes any difference for you? We've done memory and performance optimizations in the affected part of the code.

pltaylor commented 7 months ago

Is there a docker image built of the latest master branch?

pltaylor commented 7 months ago

For insight on my side it turns out that I originally unleashed Immich on a folder that contained 2 copies of my photos library by accident, which caused the initial bug report :)

etnoy commented 7 months ago

Is there a docker image built of the latest master branch?

A new release is on the way soon, you can also just wait

etnoy commented 7 months ago

Is there a docker image built of the latest master branch?

I'd love to hear how the 1.98 release works for you. How much memory does it use on rescan? Can you perhaps try scaling the memory available to see how much less we need?

pltaylor commented 7 months ago

v1.98.0 Tried to Re-Scan library... got the following logs [03/07/24 17:45:11] INFO Started server process [306]
[03/07/24 17:45:11] INFO Waiting for application startup.
[03/07/24 17:45:11] INFO Created in-memory cache with unloading after 300s
of inactivity.
[03/07/24 17:45:11] INFO Initialized request thread pool with 32 threads.
[03/07/24 17:45:11] INFO Application startup complete.
[Nest] 189 - 03/07/2024, 5:49:35 PM LOG [CommunicationRepository] Websocket Connect: Mr3JheDUV8HtsLdPAAAB [Nest] 238 - 03/07/2024, 5:52:41 PM ERROR [JobService] Unable to run job handler (library/library-refresh): Error: timeout exceeded when trying to connect [Nest] 238 - 03/07/2024, 5:52:41 PM ERROR [JobService] Error: timeout exceeded when trying to connect at Timeout._onTimeout (/app/immich/server/node_modules/pg-pool/index.js:205:27) at listOnTimeout (node:internal/timers:573:17) at process.processTimers (node:internal/timers:514:7) [Nest] 238 - 03/07/2024, 5:52:41 PM ERROR [JobService] Object: { "id": "19bcc9f3-45f8-46e0-99fc-6c2d68932f1b", "refreshModifiedFiles": true, "refreshAllFiles": false }

etnoy commented 7 months ago

That's odd...sounds like it times out trying to connect to postgres. Is the rest of Immich functioning?

pltaylor commented 7 months ago

Hmm... maybe not.... p.s. there is space left on the drives.

[Nest] 189 - 03/07/2024, 5:58:11 PM ERROR [QueryFailedError: could not resize shared memory segment "/PostgreSQL.3968589532" to 33554432 bytes: No space left on device at PostgresQueryRunner.query (/app/immich/server/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:219:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async SelectQueryBuilder.loadRawResults (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:2192:25) at async SelectQueryBuilder.executeEntitiesAndRawResults (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:2040:26) at async SelectQueryBuilder.getRawAndEntities (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:684:29) at async SelectQueryBuilder.getMany (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:750:25) at async AssetService.getTimeBucket (/app/immich/server/dist/domain/asset/asset.service.js:174:24)] Failed to get time bucket [Nest] 189 - 03/07/2024, 5:58:11 PM ERROR [QueryFailedError: could not resize shared memory segment "/PostgreSQL.3968589532" to 33554432 bytes: No space left on device at PostgresQueryRunner.query (/app/immich/server/node_modules/typeorm/driver/postgres/PostgresQueryRunner.js:219:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async SelectQueryBuilder.loadRawResults (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:2192:25) at async SelectQueryBuilder.executeEntitiesAndRawResults (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:2040:26) at async SelectQueryBuilder.getRawAndEntities (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:684:29) at async SelectQueryBuilder.getMany (/app/immich/server/node_modules/typeorm/query-builder/SelectQueryBuilder.js:750:25) at async AssetService.getTimeBucket (/app/immich/server/dist/domain/asset/asset.service.js:174:24)] QueryFailedError: could not resize shared memory segment "/PostgreSQL.3968589532" to 33554432 bytes: No space left on device

etnoy commented 7 months ago

Do you have inode space?

df -i

etnoy commented 7 months ago

https://stackoverflow.com/questions/56751565/pq-could-not-resize-shared-memory-segment-no-space-left-on-device

pltaylor commented 7 months ago

Ok, adjusted memory for postgres... different error... still v1.98

<--- Last few GCs --->

[328:0x43b7a320000] 174866 ms: Mark-Compact 4013.8 (4129.7) -> 4001.2 (4138.0) MB, 2952.27 / 0.00 ms (average mu = 0.540, current mu = 0.123) allocation failure; scavenge might not succeed [328:0x43b7a320000] 179974 ms: Mark-Compact 4019.3 (4139.2) -> 4012.2 (4145.7) MB, 5017.95 / 0.00 ms (average mu = 0.296, current mu = 0.018) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a8c86 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [immich_microservices] 10: 0x1503a16 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long, v8::internal::Isolate) [immich_microservices] 11: 0x14e5dfa99ef6

etnoy commented 7 months ago

If you enable debug logging and show the log before the stack trace I can understand where it crashed. Is this during library scan?

pltaylor commented 7 months ago

w/ debug logs.

[Nest] 571 - 03/08/2024, 8:12:23 AM DEBUG [LibraryService] Found 1148871 asset(s) when crawling import paths /photos/2020,/photos/2019,/photos/2018,/photos

<--- Last few GCs --->

[571:0x32782320000] 29292873 ms: Mark-Compact (reduce) 4046.9 (4143.2) -> 4046.1 (4144.0) MB, 5268.84 / 0.00 ms (average mu = 0.170, current mu = 0.012) allocation failure; scavenge might not succeed [571:0x32782320000] 29297903 ms: Mark-Compact (reduce) 4047.2 (4144.0) -> 4046.7 (4144.7) MB, 5023.72 / 0.00 ms (average mu = 0.091, current mu = 0.001) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a8c86 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [immich_microservices] 10: 0x1503a16 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long, v8::internal::Isolate) [immich_microservices] 11: 0x146ecfe99ef6

etnoy commented 7 months ago

Next attempt in #7759

joobei commented 7 months ago

I am experiencing the same in a raspberry pi with 4GB of memory. and a library size of 20k When the thumbnail generation service starts, free memory starts slowly but steadily decreasing. If I pause the thumbnail generation service, memory allocation does not become freed.

mertalev commented 7 months ago

@joobei That's not related to this issue. Image processing can lead to memory fragmentation over time, but the issue here is specific to external library scanning.

mertalev commented 7 months ago

@pltaylor @acbaldwi Can you try after changing the microservices image to this: ghcr.io/immich-app/immich-server:pr-7759 ? I want to be sure it's actually fixed.

joobei commented 7 months ago

@joobei That's not related to this issue. Image processing can lead to memory fragmentation over time, but the issue here is specific to external library scanning.

Well, my issue also occurs during external library scanning but sure.

pltaylor commented 7 months ago

Using the PR docker image..... [03/11/24 02:06:31] INFO Application startup complete.
[Nest] 212 - 03/11/2024, 2:08:21 AM DEBUG [LibraryService] Found 1148872 asset(s) when crawling import paths /photos

<--- Last few GCs --->

[212:0x3fb16320000] 207391 ms: Mark-Compact (reduce) 4043.8 (4143.7) -> 4043.4 (4143.5) MB, 5272.74 / 0.00 ms (average mu = 0.074, current mu = 0.001) allocation failure; scavenge might not succeed [212:0x3fb16320000] 212685 ms: Mark-Compact (reduce) 4044.5 (4143.5) -> 4044.1 (4144.2) MB, 5284.02 / 0.00 ms (average mu = 0.038, current mu = 0.002) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a7d56 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [immich_microservices] 10: 0x1099984 v8::internal::FactoryBase::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [immich_microservices] 11: 0x109c166 v8::internal::FactoryBase::NewRawOneByteString(int, v8::internal::AllocationType) [immich_microservices] 12: 0x1559f1e v8::internal::IncrementalStringBuilder::IncrementalStringBuilder(v8::internal::Isolate) [immich_microservices] 13: 0x121f6fd v8::internal::JsonStringify(v8::internal::Isolate, v8::internal::Handle, v8::internal::Handle, v8::internal::Handle) [immich_microservices] 14: 0xf5c5b2 v8::internal::Builtin_JsonStringify(int, unsigned long, v8::internal::Isolate) [immich_microservices] 15: 0x193cdf6 [immich_microservices] [Nest] 318 - 03/11/2024, 2:10:00 AM LOG [CommunicationRepository] Initialized websocket server [Nest] 318 - 03/11/2024, 2:10:00 AM LOG [SystemConfigService] LogLevel=debug (set via system config) [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering thumbnailGeneration with a concurrency of 5 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering metadataExtraction with a concurrency of 5 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering videoConversion with a concurrency of 1 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering faceDetection with a concurrency of 2 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering facialRecognition with a concurrency of 1 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering smartSearch with a concurrency of 2 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering backgroundTask with a concurrency of 5 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering storageTemplateMigration with a concurrency of 1 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering migration with a concurrency of 5 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering search with a concurrency of 5 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering sidecar with a concurrency of 5 [Nest] 318 - 03/11/2024, 2:10:00 AM DEBUG [JobService] Registering library with a concurrency of 5 [Nest] 318 - 03/11/2024, 2:10:00 AM LOG [MetadataRepository] Initializing metadata repository [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [MetadataService] Initialized local reverse geocoder [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [NestFactory] Starting Nest application... [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] ConfigHostModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] DiscoveryModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] ScheduleModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] ConfigModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] InfraModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] DomainModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [InstanceLoader] MicroservicesModule dependencies initialized [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [NestApplication] Nest application successfully started [Nest] 318 - 03/11/2024, 2:10:45 AM LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.98.1] [PRODUCTION] [Nest] 318 - 03/11/2024, 2:10:51 AM DEBUG [LibraryService] Found 1148872 asset(s) when crawling import paths /photos

<--- Last few GCs --->

[318:0x38e68320000] 120384 ms: Mark-Compact 4018.8 (4133.0) -> 4011.7 (4142.5) MB, 3057.20 / 0.00 ms (average mu = 0.332, current mu = 0.043) allocation failure; scavenge might not succeed [318:0x38e68320000] 125671 ms: Mark-Compact 4029.8 (4144.0) -> 4022.7 (4154.0) MB, 4812.50 / 0.00 ms (average mu = 0.198, current mu = 0.090) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a8c86 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [immich_microservices] 10: 0x1503a16 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long, v8::internal::Isolate) [immich_microservices] 11: 0x14d10a899ef6 [Nest] 336 - 03/11/2024, 2:12:07 AM LOG [CommunicationRepository] Initialized websocket server [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [SystemConfigService] LogLevel=debug (set via system config) [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering thumbnailGeneration with a concurrency of 5 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering metadataExtraction with a concurrency of 5 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering videoConversion with a concurrency of 1 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering faceDetection with a concurrency of 2 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering facialRecognition with a concurrency of 1 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering smartSearch with a concurrency of 2 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering backgroundTask with a concurrency of 5 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering storageTemplateMigration with a concurrency of 1 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering migration with a concurrency of 5 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering search with a concurrency of 5 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering sidecar with a concurrency of 5 [Nest] 336 - 03/11/2024, 2:12:08 AM DEBUG [JobService] Registering library with a concurrency of 5 [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [MetadataRepository] Initializing metadata repository [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [MetadataService] Initialized local reverse geocoder [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [NestFactory] Starting Nest application... [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] ConfigHostModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] DiscoveryModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] ScheduleModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] ConfigModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] BullModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] TypeOrmModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] InfraModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] DomainModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [InstanceLoader] MicroservicesModule dependencies initialized [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [NestApplication] Nest application successfully started [Nest] 336 - 03/11/2024, 2:12:08 AM LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.98.1] [PRODUCTION]

mertalev commented 7 months ago

Oh wow, it's still not working? There are a few ways we can reduce memory usage even more, but before that I think we should make a branch with more debug logs so we have better insight into where and how the heap is getting filled up.

What we know is that the crawl completes successfully, that it crashes somewhere between the creation of a set and the paginated queries, and that the allocation error (as of the pr) is related to JSON serialization.

pltaylor commented 7 months ago

Correct, still fails. I'll be glad to run any updates to the PR you want to make.

etnoy commented 7 months ago

Can you try changing your microservices image from

image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}

to

image: ghcr.io/immich-app/immich-server:pr-7914

and try again? This should print lots of debug logs

You need to do

docker compose down
docker compose pull
docker compose up

(it might need a few minutes to build the docker images)

pltaylor commented 7 months ago

Ok, so I rebuilt my Immich install using the recommended docker compose method on my unraid server. I modified the microservices image to the above one. And tried to import my library. It failed, recycled the microservices instance, and then closed my log window (unraid is helpful like that). When I open it back up again I am just left with the tail end of the log here.

<--- Last few GCs --->

[7:0x25f08320000] 427526 ms: Scavenge 4019.4 (4125.7) -> 4018.8 (4133.7) MB, 15.00 / 0.00 ms (average mu = 0.667, current mu = 0.554) allocation failure; [7:0x25f08320000] 427592 ms: Scavenge 4026.3 (4133.7) -> 4024.5 (4133.7) MB, 21.24 / 0.00 ms (average mu = 0.667, current mu = 0.554) allocation failure; [7:0x25f08320000] 429962 ms: Scavenge 4027.8 (4133.7) -> 4026.5 (4153.4) MB, 2341.78 / 0.00 ms (average mu = 0.667, current mu = 0.554) allocation failure;

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a81ca v8::internal::Factory::AllocateRawWithAllocationSite(v8::internal::Handle, v8::internal::AllocationType, v8::internal::Handle) [immich_microservices] 10: 0x10b5b77 v8::internal::Factory::NewJSArrayWithUnverifiedElements(v8::internal::Handle, v8::internal::Handle, int, v8::internal::AllocationType) [immich_microservices] 11: 0x10b5da5 v8::internal::Factory::NewJSArrayWithElements(v8::internal::Handle, v8::internal::ElementsKind, int, v8::internal::AllocationType) [immich_microservices] 12: 0x13169e7 v8::internal::FastGetOwnValuesOrEntries(v8::internal::Isolate, v8::internal::Handle, bool, v8::internal::Handle) [immich_microservices] 13: 0x1316ca7 v8::internal::GetOwnValuesOrEntries(v8::internal::Isolate, v8::internal::Handle, v8::internal::PropertyFilter, bool, bool) [immich_microservices] 14: 0x150dba7 v8::internal::Runtime_ObjectEntries(int, unsigned long, v8::internal::Isolate*) [immich_microservices] 15: 0x14c72c6d9ef6 [Nest] 7 - 03/13/2024, 10:20:53 AM DEBUG [JobService] Registering migration with a concurrency of 5

do you know where I can find a longer log file?

pltaylor commented 7 months ago

Ok, I ran just that docker logs file for the microservices container (should of thought of that earlier and got the full log

[Nest] 7 - 03/13/2024, 10:28:10 AM LOG [NestApplication] Nest application successfully started [Nest] 7 - 03/13/2024, 10:28:10 AM LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.98.2] [PRODUCTION] [Nest] 7 - 03/13/2024, 10:32:44 AM DEBUG [LibraryService] Found 1148832 asset(s) when crawling import paths ./photos [Nest] 7 - 03/13/2024, 10:32:44 AM DEBUG [LibraryService] Pagination complete [Nest] 7 - 03/13/2024, 10:32:44 AM DEBUG [LibraryService] Checked offline paths [Nest] 7 - 03/13/2024, 10:32:44 AM DEBUG [LibraryService] About to start scan

<--- Last few GCs --->

[7:0x4b86e320000] 330803 ms: Scavenge 4027.7 (4135.2) -> 4027.4 (4139.2) MB, 12.50 / 0.00 ms (average mu = 0.692, current mu = 0.582) allocation failure; [7:0x4b86e320000] 330835 ms: Scavenge 4031.0 (4139.2) -> 4030.1 (4139.2) MB, 14.67 / 0.00 ms (average mu = 0.692, current mu = 0.582) allocation failure; [7:0x4b86e320000] 330865 ms: Scavenge 4031.9 (4139.2) -> 4031.2 (4147.2) MB, 21.07 / 0.00 ms (average mu = 0.692, current mu = 0.582) allocation failure;

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a8c86 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [immich_microservices] 10: 0x1503a16 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long, v8::internal::Isolate) [immich_microservices] 11: 0x1539f9359ef6 [Nest] 7 - 03/13/2024, 10:33:49 AM LOG [CommunicationRepository] Initialized websocket server [Nest] 7 - 03/13/2024, 10:33:49 AM LOG [SystemConfigService] LogLevel=debug (set via system config) [Nest] 7 - 03/13/2024, 10:33:49 AM DEBUG [JobService] Registering thumbnailGeneration with a concurrency of 5 [Nest] 7 - 03/13/2024, 10:33:49 AM DEBUG [JobService] Registering metadataExtraction with a concurrency of 5 [Nest] 7 - 03/13/2024, 10:33:49 AM DEBUG [JobService] Registering videoConversion with a concurrency of 1

etnoy commented 7 months ago

Thanks, that is very helpful. I think I found the issue

etnoy commented 7 months ago

OK, try to pull the latest version. I've added more logging and a possible fix.

pltaylor commented 7 months ago

The docker compose update happened really quickly, so I'm not 100% it got the change. Here is the logs for your fun.

[Nest] 7 - 03/13/2024, 3:02:24 PM LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.98.2] [PRODUCTION] [Nest] 7 - 03/13/2024, 3:08:48 PM DEBUG [LibraryService] Found 1148832 asset(s) when crawling import paths ./photos [Nest] 7 - 03/13/2024, 3:08:48 PM DEBUG [LibraryService] Pagination complete [Nest] 7 - 03/13/2024, 3:08:48 PM DEBUG [LibraryService] Checked offline paths [Nest] 7 - 03/13/2024, 3:08:48 PM DEBUG [LibraryService] About to start scan

<--- Last few GCs --->

[7:0x4433e320000] 441513 ms: Scavenge 4014.1 (4127.9) -> 4010.1 (4127.9) MB, 27.75 / 0.00 ms (average mu = 0.519, current mu = 0.056) allocation failure; [7:0x4433e320000] 441609 ms: Scavenge 4014.6 (4127.9) -> 4012.3 (4136.7) MB, 67.21 / 0.00 ms (average mu = 0.519, current mu = 0.056) allocation failure; [7:0x4433e320000] 443960 ms: Mark-Compact 4026.9 (4138.4) -> 4017.4 (4146.7) MB, 2279.36 / 0.00 ms (average mu = 0.407, current mu = 0.116) allocation failure; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace -----

1: 0xca5580 node::Abort() [immich_microservices] 2: 0xb781f9 [immich_microservices] 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, v8::OOMDetails const&) [immich_microservices] 5: 0x10dc505 [immich_microservices] 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices] 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [immich_microservices] 9: 0x10a8c86 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [immich_microservices] 10: 0x1503a16 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long, v8::internal::Isolate) [immich_microservices] 11: 0x1487eda59ef6 [Nest] 7 - 03/13/2024, 3:09:54 PM LOG [CommunicationRepository] Initialized websocket server

etnoy commented 7 months ago

It should take a little time to do the pull, can you try again?

pltaylor commented 7 months ago

do you know what the microservices container id should be?

etnoy commented 7 months ago

The latest seems to be sha256:cfad2066cd92f4d716ff56124e276d604fd2c927c6a719f8ca90eae890e41e62

about one hour ago, so this is the right one

https://github.com/immich-app/immich/pkgs/container/immich-server/190492608?tag=pr-7914

pltaylor commented 7 months ago

Progress:
image

etnoy commented 7 months ago

That's awesome!

Thanks for being patient with us.

We'll try to get the fixed code in the next release.

Can you clear the queue and try different levels of memory allocation? This latest version should use significantly less memory for the initial queuing. How low can we go?

pltaylor commented 7 months ago

right now I have a box stock compose file, well, I added the folder that had the pictures as a volume, and I pointed the microservices to the PR branch release.

etnoy commented 7 months ago

Are you using the release or PR version of the microservices container?

pltaylor commented 7 months ago

Yes, I edited my original comment to include that. I was trying to say that I haven't made any changes on the memory allocation front as of now.

etnoy commented 7 months ago

Yes, I edited my original comment to include that. I was trying to say that I haven't made any changes on the memory allocation front as of now.

Would you be willing to try how #7934 works in your case? I'm experimenting with doing all scans 1000 assets at a time, never loading the full list.

pltaylor commented 7 months ago

I’m out of the country till this weekend, I’ll be glad to try it when I get back.

On Wed, Mar 20, 2024 at 7:32 AM Jonathan Jogenfors @.***> wrote:

Yes, I edited my original comment to include that. I was trying to say that I haven't made any changes on the memory allocation front as of now.

Would you be willing to try how #7934 https://github.com/immich-app/immich/pull/7934 works in your case? I'm experimenting with doing all scans 1000 assets at a time, never loading the full list.

— Reply to this email directly, view it on GitHub https://github.com/immich-app/immich/issues/7373#issuecomment-2009573656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHQZXP6DDCIP7GN4L2ZCKLYZGFVPAVCNFSM6AAAAABDW5OFSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBZGU3TGNRVGY . You are receiving this because you were mentioned.Message ID: @.***>

pltaylor commented 7 months ago

Testing now with a 'force full re-scan', overall seems like a multi-threading improvement.. i.e. the batches are smaller and then kicking off smaller batches sooner. But I did see this... [Nest] 7 - 03/25/2024, 9:10:38 PM ERROR [JobService] Unable to run job handler (faceDetection/face-detection): Error: Machine learning request to "http://immich-machine-learning:3003" failed with Error: write EPIPE [Nest] 7 - 03/25/2024, 9:10:38 PM ERROR [JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with Error: write EPIPE at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19 at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21) at async PersonService.handleDetectFaces (/usr/src/app/dist/domain/person/person.service.js:257:23) at async /usr/src/app/dist/domain/job/job.service.js:141:36 at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28) at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24) [Nest] 7 - 03/25/2024, 9:10:38 PM ERROR [JobService] Object: { "id": "2b5194f5-7b6e-4227-870f-d2232c620ae0", "source": "upload" }