immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
51.52k stars 2.73k forks source link

[BUG] Micro service container keep crashing when deploying Immich in Truenas Scale #4930

Closed hlstwizard closed 11 months ago

hlstwizard commented 1 year ago

The bug

Micro service container keep crashing

The OS that Immich Server is running on

Truenas Scale

Version of Immich Server

v1.85.0

Version of Immich Mobile App

v.1.82.0

Platform with the issue

Your docker-compose.yml content

https://github.com/truenas/charts/tree/master/community/immich

Your .env content

N/A

Reproduction steps

1.Install Immich using Truenas community chart
2.Restore postgreSQL from a backup
3.Start the service and the microservice keep crashing

Additional information

Error logs

Defaulted container "immich" out of: immich, immich-init-postgres-wait (init), immich-init-redis-wait (init), immich-init-wait-url (init)
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [NestFactory] Starting Nest application...
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized +41ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] BullModule dependencies initialized +0ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] ConfigHostModule dependencies initialized +1ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] DiscoveryModule dependencies initialized +0ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] ScheduleModule dependencies initialized +0ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] ConfigModule dependencies initialized +6ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] BullModule dependencies initialized +0ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] BullModule dependencies initialized +0ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized +173ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized +0ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] InfraModule dependencies initialized +4ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] DomainModule dependencies initialized +21ms
[Nest] 8  - 11/09/2023, 8:30:46 PM     LOG [InstanceLoader] MicroservicesModule dependencies initialized +0ms

<--- Last few GCs --->

[8:0x274a4310000]   106399 ms: Scavenge 2030.3 (2074.7) -> 2027.4 (2075.2) MB, 5.08 / 0.01 ms  (average mu = 0.267, current mu = 0.225) allocation failure;
[8:0x274a4310000]   106422 ms: Scavenge 2031.3 (2075.7) -> 2028.4 (2076.2) MB, 5.69 / 0.01 ms  (average mu = 0.267, current mu = 0.225) allocation failure;
[8:0x274a4310000]   106449 ms: Scavenge 2032.3 (2076.7) -> 2029.3 (2085.2) MB, 5.46 / 0.01 ms  (average mu = 0.267, current mu = 0.225) allocation failure;

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xc99960 node::Abort() [immich_microservices]
 2: 0xb6ffcb  [immich_microservices]
 3: 0xebe910 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [immich_microservices]
 4: 0xebebf7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [immich_microservices]
 5: 0x10d06a5  [immich_microservices]
 6: 0x10d0c34 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [immich_microservices]
 7: 0x10e7b24 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [immich_microservices]
 8: 0x10e833c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [immich_microservices]
 9: 0x10ea49a v8::internal::Heap::HandleGCRequest() [immich_microservices]
10: 0x1055907 v8::internal::StackGuard::HandleInterrupts() [immich_microservices]
11: 0x14f7322 v8::internal::Runtime_StackGuardWithGap(int, unsigned long*, v8::internal::Isolate*) [immich_microservices]
12: 0x7f2f10e99ef6
Aborted
hlstwizard commented 1 year ago

After disable machine-learning and typesense

still get the same error

immich-microservices-67b658995c-nbc75   0/1     CrashLoopBackOff   6 (2m18s ago)   19m
alextran1502 commented 1 year ago

Can you somehow set the environment for the microservices with the value NODE_OPTIONS=--max-old-space-size=8192 to increase the RAM limit of the container?

schmitzkr commented 1 year ago

I had a similar issue on SCALE (the truechart chart for immich) I was not able to set the environment variable because of what I think is a bug. But in the community chart you might be able to do this. I also noticed that when I upgraded the server and the mobile app this issue went away for me. I think the mobile app was trying to sync the entire library and this was causing the node instance to crash.

hlstwizard commented 1 year ago

Thanks for the reply and @alextran1502, immich is a great piece!

After some triage, I would suspect that the problem is probably migrating from a docke-compose deployment to a truenas scale (k3s) deployment. Because every time I restore the postgre database which is dumped from the docker-compose, this happens.


So I tried another way, I just start a fresh truenas instance and bulk upload a library using cli command. It somehow works, but immich would need to regenerate the thumbs and encoded video. That's somehow fine, too.

zackpollard commented 11 months ago

Out of memory exceptions are quite often caused by the library we used for reverse geocoding. In the next release we have completely reworked how reverse geocoding will function. We are replacing the old library with our own implementation that uses postgres for doing the reverse geocoding. This should result in much lower memory usage overall, and also should eliminate the microservices container running out of memory. We believe this was happening due to some issue with the logic in the old geocoding implementation which was not written by us. If you still continue to experience these issues after the next release (1.89.0) please open a new issue describing the problem. Cheers!