strukturag / libheif

libheif is an HEIF and AVIF file format decoder and encoder.
Other
1.71k stars 298 forks source link

high RAM usage when reading in 2MB .heic file #1032

Closed sherrardb closed 10 months ago

sherrardb commented 10 months ago

good day, as mentioned here: https://github.com/tonycoz/imager/issues/511 i am experiencing what appears to be extreme growth (500-750MB) in VSZ when reading in a relatively-small (2MB) file.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 4280 sburton   20   0  752.5m  47.0m  14.5m S 362.5   0.6   0:00.64 /usr/local/bin/heif-info /tmp/00600919-CJg-7HLh.heic

library versions:

~$ locate .so | grep -F -e heif -e heic -e libde265 | xargs -r ls -lrt
lrwxrwxrwx 1 root    root       12 Nov  4 15:16 /usr/local/lib/libheif.so -> libheif.so.1
-rwxr-xr-x 1 root    root  6479808 Nov 10 16:01 /usr/local/lib/libde265.so.0.1.5
lrwxrwxrwx 1 root    root       17 Nov 10 16:01 /usr/local/lib/libde265.so.0 -> libde265.so.0.1.5
lrwxrwxrwx 1 root    root       17 Nov 10 16:01 /usr/local/lib/libde265.so -> libde265.so.0.1.5
lrwxrwxrwx 1 root    root       17 Nov 10 16:14 /usr/local/lib/libheif.so.1 -> libheif.so.1.17.3
-rw-r--r-- 1 root    root  3537432 Nov 11 03:13 /usr/local/lib/libheif.so.1.17.3

i have compiled libheif with the following:

=== Summary of compiled codecs ===
libde265 HEVC decoder        : + separate plugin
FFMPEG HEVC decoder (HW acc) : + separate plugin
x265 HEVC encoder            : + separate plugin
Kvazaar HEVC encoder         : - disabled
AOM AV1 decoder              : - disabled
AOM AV1 encoder              : - disabled
Dav1d AV1 decoder            : - disabled
SVT AV1 encoder              : - disabled
Rav1e AV1 encoder            : - disabled
JPEG decoder                 : - disabled
JPEG encoder                 : - disabled
OpenJPEG J2K decoder         : - disabled
OpenJPEG J2K encoder         : - disabled
libsharpyuv: disabled

for context, i discovered this because i am reading and dynamically manipulating this image within an apache context and this is causing my apache children to consume well more than their usual amount of RAM.

please let me know if there is anything critical that i have omitted.

cheers 00600919-CJg-7HLh.zip

bradh commented 10 months ago

I ran this through valgrind (with massif) and I'm not seeing anything like that kind of usage in actual allocation.

bradh@micrantha:~/libheif/build$ valgrind --tool=massif ./examples/heif-info ~/test/tmp/00600919-CJg-7HLh.heic
==53943== Massif, a heap profiler
==53943== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==53943== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==53943== Command: ./examples/heif-info /home/bradh/test/tmp/00600919-CJg-7HLh.heic
==53943== 
MIME type: image/heic
main brand: heic
compatible brands: mif1, MiPr, miaf, MiHB, heic

image: 4032x3024 (id=49), primary
  colorspace: YCbCr, 4:2:0
  bit depth: 8
  thumbnail: 320x240
  color profile: prof
  alpha channel: no 
  depth channel: no
metadata:
  Exif: 2138 bytes
transformations:
  angle (ccw): 0
region annotations:
  none
properties:
==53943== 
bradh@micrantha:~/libheif/build$ massif-visualizer massif.out.53943 
Qt: Session management error: Could not open network socket
loaded massif file: QUrl("file:///home/bradh/libheif/build/massif.out.53943")
description: "(none)"
command: "./examples/heif-info /home/bradh/test/tmp/00600919-CJg-7HLh.heic"
time unit: "i"
snapshots: 59
peak: snapshot # 28 after 2.90902e+09 "i"
peak cost: "40.8 MiB" heap "14.7 KiB" heap extra "0 B" stacks

When running with heif-convert it gets a bit higher because of the format conversion, but not much.

bradh@micrantha:~/libheif/build$ valgrind --tool=massif ./examples/heif-convert ~/test/tmp/00600919-CJg-7HLh.heic -o 1032.jpg
==54165== Massif, a heap profiler
==54165== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==54165== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==54165== Command: ./examples/heif-convert /home/bradh/test/tmp/00600919-CJg-7HLh.heic -o 1032.jpg
==54165== 
File contains 1 image
Written to 1032.jpg
==54165== 
bradh@micrantha:~/libheif/build$ massif-visualizer massif.out.54165 
Qt: Session management error: Could not open network socket
loaded massif file: QUrl("file:///home/bradh/libheif/build/massif.out.54165")
description: "(none)"
command: "./examples/heif-convert /home/bradh/test/tmp/00600919-CJg-7HLh.heic -o 1032.jpg"
time unit: "i"
snapshots: 57
peak: snapshot # 55 after 8.08601e+09 "i"
peak cost: "52.5 MiB" heap "10.4 KiB" heap extra "0 B" stacks

You do mention VSZ, which is potential (not actual) usage. So maybe this is about how you are measuring, not what you are measuring?

farindk commented 10 months ago

@brad Yes, that is exactly the expected amount of memory. The image is 4032x3024. In YCbCr 4:2:0 that is 4032x3024*1.5 bytes and when converted to RGB in the decoder, 4032x3024*3 bytes. That's a total of 52.3 MiB. Exactly what you measured.

farindk commented 10 months ago

@sherrardb It's of course possible that HEIF files are constructed as memory bombs. It might be wise to check the image size before decoding and put a limit onto that if you are tight on memory. The built-in security limit in libheif is a maximum image size of 32768x32768 pixels. During decoding, this may need 4.5GB.

sherrardb commented 10 months ago

@brad and @farindk my troubleshooting indeed shows a sane amount of actual memory usage, in the same range that you have both confirmed, but what i am wondering is whether the the amount of virtual memory being reported is an expected outcome (or a side-effect of something else) or whether i should be hunting for a bug on my systems.

is either of you able to reproduce the virtual memory figures that i am seeing? my chosen method, coarse as it may be, for capturing the above output was to run a loop in one terminal while running the command in another.

~$  while true; do top -b -c -Em -n1 | grep -F -e .heic -e VIRT | grep -vF grep; echo; sleep .25; done

conversely, is there a tool or a profiling context, similar to valgrind, that i can run this under that will give more insight into where/why this amount of potential memory allocation is being requested?

TIA

sherrardb commented 10 months ago

also, not directly related, i began by trying to find another image similar in dimensions and file size so that i could compare the behavior. it turns out that there is a prime candidate included in the libheif repo. but the decoder generates an unspecified error for this particular image:

~$  /usr/local/bin/heif-info /tmp/libheif/fuzzing/data/corpus/clusterfuzz-testcase-minimized-file-fuzzer-5752063708495872.heic
MIME type: image/heic
main brand: heic
compatible brands: mi, heic

image: 3024x4032 (id=49)
  colorspace: YCbCr, 4:2:0
  bit depth: 8
  color profile: prof
  alpha channel: no 
  depth channel: no
metadata:
  Exif: 1728 bytes
transformations:
  angle (ccw): 270
region annotations:
  none
properties:
Could not decode image Decoder plugin generated an error: Unspecified
farindk commented 10 months ago

This is just a guess: could it be that when the C++ memory allocator has to alloc some ~10MB blocks, that it is moving the allocated memory boundary (sbrk) by big steps (couple of 100MBs) to prepare for future allocations? I.e. VSZ increases a lot, but this has no practical effects as the memory space is not used. Maybe this could be alleviated by allocating the memory in smaller chunks (but that makes the image handling more complex) and it would be optimizing for a particular memory allocator. Still, it might be worthwhile to write a dummy program just to understand how the memory allocator works.

sherrardb commented 10 months ago

@farindk funny you should say that. i was just running under strace to see if i could get an idea when the mapping begins. if i am interpreting the strace output correctly (not a given), then sometime after "properties:" has been output, we seem to be mapping a few 12MB and 8MB blocks.

~$  grep -F mmap /tmp/strace-heif-info.txt.13962 | tail -n7
mmap(NULL, 12193792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3c1236000
mmap(NULL, 12193792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3c0695000
mmap(NULL, 12193792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3bfaf4000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3bf2f3000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3beaf2000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3b62f1000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3be2f1000

of course i may be misinterpreting any of this, and all of it may be a wild goose chase, as it is still early in the morning here :-).

good day.

strace-heif-info.zip

farindk commented 10 months ago

Also when I examine the output of strace heif-info, I cannot see a huge unexpected memory use. It's all below 100MB.

sherrardb commented 10 months ago

well that's interesting. in the strace output attached above, i appear to have several instances of mmap with > 100MB, again if i am interpreting the values correctly.

~$ grep -Fh mmap strace-heif-info.* | cut -d ',' -f2 | cut -c2- | sort -n | xargs -r -n1 printf "%'d\n" | tail
27,181,056
27,193,360
67,108,864
67,108,864
67,108,864
134,217,728
134,217,728
134,217,728
134,217,728
134,217,728

and it appears to happen in multiple of the child processes.

~$ grep -Fl 134217728 /tmp/strace-heif-info.txt.*
/tmp/strace-heif-info.txt.13963
/tmp/strace-heif-info.txt.13965
/tmp/strace-heif-info.txt.13966
/tmp/strace-heif-info.txt.13968
/tmp/strace-heif-info.txt.13969
bradh commented 10 months ago

@sherrardb Can you show some context around the lines where those are occurring? At least back to the previous openat entry?

sherrardb commented 10 months ago

@bradh the entire contents of the strace output is in the zip file attached about two posts back.

not sure why i didn't also include the original command, but based on my standard aproach, it was probably something like

~$ strace -ff -s2048 /usr/local/...
bradh commented 10 months ago

I couldn't make much of your logs without timestamps, but have reproduced your test scenario, and I think I see part of the picture. It isn't the mmap of a file, but rather that the thread is getting 134217728 bytes (i.e. 0x8000000) as a memory space.

sherrardb commented 10 months ago

let me know if you would like me to regenerate the strace with timestamps, and if so, whether you would prefer -t, -tt or -ttt, or if there are any other options you would like me to include.

bradh commented 10 months ago

let me know if you would like me to regenerate the strace with timestamps, and if so, whether you would prefer -t, -tt or -ttt, or if there are any other options you would like me to include.

No, I can reproduce. However it does look vaguely reasonable in terms of address space allocation (i.e. the parent allocates the buffers, and the child address space is rounded up from that. Is allocation of address space (without actual memory usage) actually a problem though?

sherrardb commented 10 months ago

No, I can reproduce. However it does look vaguely reasonable in terms of address space allocation (i.e. the parent allocates the buffers, and the child address space is rounded up from that. Is allocation of address space (without actual memory usage) actually a problem though?

good question. :-)

there are a few contexts at play here.

  1. the individual apache child
  2. multiple/all apache children
  3. the entire VM

if this only ever impacted (1), then i would have likely ignored it. but given the nature of the dynamic image transform code that triggers the issue, if a given source image is in current use, it is likely to impact (2). unfortunately, i don't know enough about how the kernel manages memory to make an informed assessment of how that might impact (3). for example when you have several processes allocating .5-.75GB of RAM apiece, even without actually using it, does the kernel start to (unnecessarily) swap out other processes? does it "steal" RAM that was being used for buffers/cache? is there any measurable systemic impact at all?

unfortunately i don't know. :-(

basically i started down this path because i have monitoring alerts that are triggered when processes seem to be doing Bad Things(tm), and based on how the thresholds are currently configured, the system thought this was indeed a bad thing.

since my daily activities are much more focused on the application-development side of things than devops/sysadmin, can you give me a high-level explanation of how address space rounding can take us from ~50MB of RAM which is actually needed, to essentially 10X that being allocated?

thanks for your time and help thus far

farindk commented 10 months ago

Have you tried to use libheif without multi-threading? Either using heif_context_set_max_decoding_threads(ctx, 0) or by compiling with cmake -DENABLE_MULTITHREADING_SUPPORT=OFF. Does that reduce the allocation? Note that it will still use multiple threads within libde265, but it should be less.

sherrardb commented 10 months ago

i have not. i am using libheif indirectly via perl module Imager::File::HEIF, so i suspect that the first suggestion is out completely. but i will recompile using the second suggestion and revert.

thx

sherrardb commented 10 months ago

disabling multi-threading indeed does have a positive effect. it appears that VIRT is consistently reduced to ~250MB.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
30113 sburton   20   0  249.5m  50.1m  14.2m S  93.8   0.6   0:00.78 /usr/local/bin/heif-info /tmp/00600919-CJg-7HLh.heic
bradh commented 10 months ago

can you give me a high-level explanation of how address space rounding can take us from ~50MB of RAM which is actually needed, to essentially 10X that being allocated?

The strace showed 4 threads @ 128 GB (== 0x8000000), plus the parent address space. So ~5 * 128GB.

Opinion: I still don't get the concern. You seem to be focused on an number that is not important. Nothing in the address space drives swap - it works on pages, not on processes. If you are worried about performance, track performance metrics for your usage. And maybe Perl wouldn't be the first choice...

I'll leave you with it.

sherrardb commented 10 months ago

@bradh thank you for your time, your patience and your feedback.

i can indeed appreciate that this may be much ado about nothing. based on my (still) rudimentary understanding of the reasons that library would allocate such an inordinate (compared to the image file size or the actual amount required for the decoded image) amount of RAM that it did not intend on using, the only perspective from which i could approach this is "something doesn't look quite right". whether a particular "something" is important to be looking at all is often only determined after investigation.

we do track performance metrics. but sometimes performance issues only present themselves after they have hit a tipping point. and occasionally when you track when the change was made that eventually triggered the problem, you discover that there were other indicators (trends in apparently-unrelated graphs) that may not have seemed important at the time, but that actually presaged the problem. so i assess the data from performance tracking and monitoring tools, through the lens of experience and, for good or bad, apply more than a pinch of intuition.

again, i sincerely appreciate your assistance.

sherrardb commented 10 months ago

just in case it is of interest to either of you two, or anybody else stumbling across this in the future, after disabling the FFMPEG decoder, the value dropped yet again.

=== Summary of compiled codecs ===
libde265 HEVC decoder        : + built-in
FFMPEG HEVC decoder (HW acc) : - disabled
x265 HEVC encoder            : + built-in
Kvazaar HEVC encoder         : - disabled
AOM AV1 decoder              : - not found
AOM AV1 encoder              : - not found
Dav1d AV1 decoder            : - disabled
SVT AV1 encoder              : - disabled
Rav1e AV1 encoder            : - disabled
JPEG decoder                 : - disabled
JPEG encoder                 : - disabled
OpenJPEG J2K decoder         : - disabled
OpenJPEG J2K encoder         : - disabled
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 5221 sburton   20   0  131.3m  39.8m   5.5m S 106.7   0.5   0:00.65 /usr/local/bin/heif-info /tmp/00600919-CJg-7HLh.heic
Stianhn commented 9 months ago

@sherrardb How do you disable the decoder?

sherrardb commented 9 months ago

@sherrardb How do you disable the decoder?

if memory serves correctly, i simply uninstalled the libffmpeg/libffmpeg-dev packages on my system. you should be able to confirm success by the output of "Summary of compiled codecs" after running cmake.

otherwise, if you cannot uninstall these, there may be a way to explicitly disable by inverting the WITH_FFMPEG_DECODER flag to cmake, but you will have to check the build docs.

HTH