immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
39.5k stars 1.87k forks source link

[BUG] magick-files filling /tmp on microservices container #4391

Open JaCoB1123 opened 9 months ago

JaCoB1123 commented 9 months ago

The bug

I just ran out of storage on my server, because immich used everything as it became free. I found the culprit to be the microservices container storing loads of magick-.... files in /tmp.

I found imagemagick - lot of temp magick files created in temporary folder - Stack Overflow, which might be somewhat relevant. I was low on diskspace anyways, so that might have something to do with it as well. I currently can see immich using 16GB already again though:

Current output of the /tmp folder ``` 1.1 GiB [##########] magick-o0d7QVXyyqVGBKkCsprifPsENFer3lmt 1.1 GiB [##########] magick-g554RVzipD40KPGRsgFNiSTbn9NzX3IN 445.9 MiB [### ] magick-n4THgZV_-CRMN-Qw8I1ynMS8s_JJoVty 445.9 MiB [### ] magick-AVrZO0s9YC-Z7SpHKHQlezvBRzu6652G 445.9 MiB [### ] magick-8Ubfyu1EY6R0owx-S7lvCVdfyqUcr5UE 439.7 MiB [### ] magick-jaQh68ePDj7JHHe3IcMQdpTQ15MT1CTd 439.7 MiB [### ] magick-QddHR_O-Y_6PfSmpftrTAYNfyh7eE06r 439.7 MiB [### ] magick-ECIzr-9aqnTHSUZa7BSH684lBTeK7ldR 434.4 MiB [### ] magick-VqeARdekpodWVUoGWUlfoY-vNfkgfBTg 434.4 MiB [### ] magick-UAkXGw84_0iLVX4qR89zqF7gvgFk2YP7 434.4 MiB [### ] magick-ItXDtvZijxkb46CHfsCskutrwz4vimVW 401.6 MiB [### ] magick-_60oERsp7SB82iAYW4xMDAJkwPw1NC6z 401.1 MiB [### ] magick-lrJ091qvM6V2YUo5fCdf_OqoVJFrEcL9 401.1 MiB [### ] magick-WFhDu0Rdsu49BcGpmqbC0CH0tAQacBN5 401.1 MiB [### ] magick-SicQVyWmN8KxEBmTEfplzq84MSOvmhXo 392.0 MiB [### ] magick-yK8Nh6zaJ4AbEgmy0pglp4PTs3fFND0x 392.0 MiB [### ] magick-wfHzCo3ICOHDdSaUskZEuvysed8Q7J5r 392.0 MiB [### ] magick--UTW7rwFFOxsMSxhM39fLwlcsJ3IjozX 389.6 MiB [### ] magick-R56OJDDwwjXHPEcKL4eveyPiKV0E0t65 389.6 MiB [### ] magick-gaN9V7QOX8vGO5OseUDcV_ea3kCtKDwv 389.6 MiB [### ] magick-em566c1WHjYAPL7-7NduAeQNKa8DwUiC 389.6 MiB [### ] magick-CjZI6GnoGmjTfH0gaczvN3RCbdJRmnsf 389.6 MiB [### ] magick--KRz4i0lZIjcI_WIx7Bq15HB9CC6v3fg 383.1 MiB [### ] magick-FpTK4o8svoEYc0kE8xbIiSoeeS0VOgfH 383.1 MiB [### ] magick-DbchifWK7vJszl04hcqZ3P30mrTJ0zXq 381.7 MiB [### ] magick-9A4xszA_pbCe6lHaWLBHlpvNu7wwwwqW 381.7 MiB [### ] magick-1UrbO1UTJHUb1PPsY6lYCfP6kJfvU6Fp 381.0 MiB [### ] magick-gvOKuuPyFdvUKWHlznmsVUuZdSpVkJ60 381.0 MiB [### ] magick-UK8X6w4Z87o30TVSJGQsO9fNbeehOPFj 381.0 MiB [### ] magick-RKkTayQnR-tyUdJa6EBjrBffzx3LYIC6 378.9 MiB [### ] magick-sBsVLMPGnLVf2k7UcIOqQgoxbywc1Tju 378.9 MiB [### ] magick-rqBmRjbBmpmwvPUzUIENJPBEi2wMmpRB 378.9 MiB [### ] magick-0q8aT34Mw758FlxHPtmCmFU_aM5I1157 306.9 MiB [## ] magick-FHt8KeRN9ABIdi3Cy_-Y-LosC2YZW5GH 305.8 MiB [## ] magick-sxzG9GqRXrnEQrhcIkLe63oB9_c4JI3w 305.8 MiB [## ] magick-qXNBWB3w8FTlbKH1TECq5JoXu-DmP26m 305.8 MiB [## ] magick-qMyut3FS4FL9ZS5w899KR2OKrj9lVCNc 305.8 MiB [## ] magick-KudN3aYAkBH1DoLP_7AP7hCpmpbV0igE 305.8 MiB [## ] magick--zzutKtoDLSHydv0-cxHNKELq98YURGO ```

The OS that Immich Server is running on

Ubuntu 20.04

Version of Immich Server

unsure - just upgraded to 1.81.1

Version of Immich Mobile App

not relevant

Platform with the issue

Your docker-compose.yml content

not accessible right now. Will supply later.

Your .env content

not accessible right now. Will supply later.

Reproduction steps

1. Run immich

Additional information

No response

mertalev commented 9 months ago

Good catch. It sounds like libvips (the library calling Imagemagick) might not be cleaning up after it. It's worth checking if there's an issue there and making an issue if not.

JaCoB1123 commented 9 months ago

I just had to recreate the container again as it was using 43GB. I'll keep observing it.

If you need more information I'm happy to provide anything I can.

JaCoB1123 commented 6 months ago

I just had to cleanup the folder again. Here are the attached logs of the microservices container: _immich-immich-microservices-1_logs.txt

JaCoB1123 commented 4 months ago

I just had the issue again on version 1.95.1. Can this be reopened?

mertalev commented 4 months ago

How much RAM does the server have, by chance?

JaCoB1123 commented 4 months ago

The server has 16 GB and currently 7640 MB available.

mertalev commented 4 months ago

I looked into this a bit more.

I think Imagemagick has an internal limit for how much RAM it's willing to use. Once it goes above this, it starts serializing files to disk. It normally cleans these up after finishing, but based on your logs it could be that an error in an image means the file gets left behind.

I can make an upstream issue for this and make it work in-memory only in the meantime.

mertalev commented 4 months ago

Oh, and you can probably avoid this for now by lowering your thumbnail generation concurrency. It will lower the RAM usage and help keep it below the limit.

mertalev commented 4 months ago

Okay, so it might be a bit harder to fix this on our end based on the discussion here. Lowering your concurrency is probably the best option for now.

JaCoB1123 commented 4 months ago

Thanks for looking into it and creating the upstream discussion. When I understand that correctly, there's no way around the temp files for RAW images though?

I just set the thumbnail concurrency to 1 (down from 5) and will update here if it happens again.

JaCoB1123 commented 4 months ago

I just manually started the generate thumbnails where "missing" task.

It was looking good until the following log snippet. docker-compose also tells me that the container has restarted. So it seems like there's some fatal error that's not logged and I now have a single file in /tmp that's not touched any more. After restarting manually again, I have 3 more files in there.

Maybe a simple way would be to clear /tmp on startup? Or I could mount it to tmpfs instead? I'm not sure if that's cleared on container restart though.

Is there a way I can provide more logs? Changing log level to verbose via the WebUI didn't change anything.

[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail): Error: Input file has corrupt header: VipsJpeg: Premature end of JPEG file
VipsJpeg: Premature end of JPEG file
[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Error: Input file has corrupt header: VipsJpeg: Premature end of JPEG file
VipsJpeg: Premature end of JPEG file
    at Sharp.toFile (/usr/src/app/node_modules/sharp/lib/output.js:89:19)
    at MediaRepository.resize (/usr/src/app/dist/infra/repositories/media.repository.js:39:14)
    at MediaService.generateThumbnail (/usr/src/app/dist/domain/media/media.service.js:129:44)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async MediaService.handleGenerateJpegThumbnail (/usr/src/app/dist/domain/media/media.service.js:116:28)
    at async /usr/src/app/dist/domain/job/job.service.js:137:37
    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Object:
{
  "id": "a3acd549-bd72-4add-92eb-ed772e453580"
}

[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail): Error: Input file has corrupt header: VipsJpeg: Premature end of JPEG file
VipsJpeg: Premature end of JPEG file
[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Error: Input file has corrupt header: VipsJpeg: Premature end of JPEG file
VipsJpeg: Premature end of JPEG file
    at Sharp.toFile (/usr/src/app/node_modules/sharp/lib/output.js:89:19)
    at MediaRepository.resize (/usr/src/app/dist/infra/repositories/media.repository.js:39:14)
    at MediaService.generateThumbnail (/usr/src/app/dist/domain/media/media.service.js:129:44)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async MediaService.handleGenerateJpegThumbnail (/usr/src/app/dist/domain/media/media.service.js:116:28)
    at async /usr/src/app/dist/domain/job/job.service.js:137:37
    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Object:
{
  "id": "14d547a8-6806-43e3-8865-da9509cff710"
}

[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail): Error: Input file has corrupt header: VipsJpeg: Corrupt JPEG data: 728 extraneous bytes

VipsJpeg: Unsupported marker type 0x19
[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Error: Input file has corrupt header: VipsJpeg: Corrupt JPEG data: 728 extraneous bytes before marker 0x19
VipsJpeg: Unsupported marker type 0x19
    at Sharp.toFile (/usr/src/app/node_modules/sharp/lib/output.js:89:19)
    at MediaRepository.resize (/usr/src/app/dist/infra/repositories/media.repository.js:39:14)
    at MediaService.generateThumbnail (/usr/src/app/dist/domain/media/media.service.js:129:44)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async MediaService.handleGenerateJpegThumbnail (/usr/src/app/dist/domain/media/media.service.js:116:28)
    at async /usr/src/app/dist/domain/job/job.service.js:137:37
    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
[Nest] 7  - 02/29/2024, 9:36:42 PM   ERROR [JobService] Object:
{
  "id": "501a88b7-0e72-427a-898a-bdd8ba3efc3c"
}

[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [CommunicationRepository] Initialized websocket server
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [CommunicationRepository] Initialized websocket server
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [SystemConfigService] LogLevel=log (set via system config)
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [SystemConfigService] LogLevel=log (set via system config)
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [MetadataRepository] Initializing metadata repository
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [MetadataRepository] Initializing metadata repository
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [MetadataService] Initialized local reverse geocoder
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [MetadataService] Initialized local reverse geocoder
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [NestFactory] Starting Nest application...
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] BullModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] ConfigHostModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] DiscoveryModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [NestFactory] Starting Nest application...
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] BullModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] ConfigHostModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] DiscoveryModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] ScheduleModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] ConfigModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] BullModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] BullModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] ScheduleModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] ConfigModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] BullModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] BullModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] TypeOrmCoreModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] TypeOrmModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] InfraModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] InfraModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] DomainModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] MicroservicesModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [NestApplication] Nest application successfully started
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] DomainModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [InstanceLoader] MicroservicesModule dependencies initialized
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [NestApplication] Nest application successfully started
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.97.0] [PRODUCTION]
[Nest] 7  - 02/29/2024, 9:43:13 PM     LOG [ImmichMicroservice] Immich Microservices is listening on http://[::1]:3002 [v1.97.0] [PRODUCTION]
JaCoB1123 commented 4 months ago

I just added the following tmpfs volume and will report back if anything happens:

type: tmpfs
target: /tmp
tmpfs:
  size: 3000000000
JaCoB1123 commented 4 months ago

I did not have any issues since adding the tmpfs volume. Would it make sense to add that to the default docker-compose?

JaCoB1123 commented 3 weeks ago

I recently migrated to the docker-compose without distinct microservices container and had the problem again today. Adding the tmpfs mount to the server fixed the problem for me again.