Closed sherrardb closed 10 months ago
I ran this through valgrind (with massif) and I'm not seeing anything like that kind of usage in actual allocation.
bradh@micrantha:~/libheif/build$ valgrind --tool=massif ./examples/heif-info ~/test/tmp/00600919-CJg-7HLh.heic
==53943== Massif, a heap profiler
==53943== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==53943== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==53943== Command: ./examples/heif-info /home/bradh/test/tmp/00600919-CJg-7HLh.heic
==53943==
MIME type: image/heic
main brand: heic
compatible brands: mif1, MiPr, miaf, MiHB, heic
image: 4032x3024 (id=49), primary
colorspace: YCbCr, 4:2:0
bit depth: 8
thumbnail: 320x240
color profile: prof
alpha channel: no
depth channel: no
metadata:
Exif: 2138 bytes
transformations:
angle (ccw): 0
region annotations:
none
properties:
==53943==
bradh@micrantha:~/libheif/build$ massif-visualizer massif.out.53943
Qt: Session management error: Could not open network socket
loaded massif file: QUrl("file:///home/bradh/libheif/build/massif.out.53943")
description: "(none)"
command: "./examples/heif-info /home/bradh/test/tmp/00600919-CJg-7HLh.heic"
time unit: "i"
snapshots: 59
peak: snapshot # 28 after 2.90902e+09 "i"
peak cost: "40.8 MiB" heap "14.7 KiB" heap extra "0 B" stacks
When running with heif-convert it gets a bit higher because of the format conversion, but not much.
bradh@micrantha:~/libheif/build$ valgrind --tool=massif ./examples/heif-convert ~/test/tmp/00600919-CJg-7HLh.heic -o 1032.jpg
==54165== Massif, a heap profiler
==54165== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==54165== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==54165== Command: ./examples/heif-convert /home/bradh/test/tmp/00600919-CJg-7HLh.heic -o 1032.jpg
==54165==
File contains 1 image
Written to 1032.jpg
==54165==
bradh@micrantha:~/libheif/build$ massif-visualizer massif.out.54165
Qt: Session management error: Could not open network socket
loaded massif file: QUrl("file:///home/bradh/libheif/build/massif.out.54165")
description: "(none)"
command: "./examples/heif-convert /home/bradh/test/tmp/00600919-CJg-7HLh.heic -o 1032.jpg"
time unit: "i"
snapshots: 57
peak: snapshot # 55 after 8.08601e+09 "i"
peak cost: "52.5 MiB" heap "10.4 KiB" heap extra "0 B" stacks
You do mention VSZ, which is potential (not actual) usage. So maybe this is about how you are measuring, not what you are measuring?
@brad Yes, that is exactly the expected amount of memory. The image is 4032x3024. In YCbCr 4:2:0 that is 4032x3024*1.5 bytes
and when converted to RGB in the decoder, 4032x3024*3 bytes
. That's a total of 52.3 MiB. Exactly what you measured.
@sherrardb It's of course possible that HEIF files are constructed as memory bombs. It might be wise to check the image size before decoding and put a limit onto that if you are tight on memory.
The built-in security limit in libheif is a maximum image size of 32768x32768
pixels. During decoding, this may need 4.5GB.
@brad and @farindk my troubleshooting indeed shows a sane amount of actual memory usage, in the same range that you have both confirmed, but what i am wondering is whether the the amount of virtual memory being reported is an expected outcome (or a side-effect of something else) or whether i should be hunting for a bug on my systems.
is either of you able to reproduce the virtual memory figures that i am seeing? my chosen method, coarse as it may be, for capturing the above output was to run a loop in one terminal while running the command in another.
~$ while true; do top -b -c -Em -n1 | grep -F -e .heic -e VIRT | grep -vF grep; echo; sleep .25; done
conversely, is there a tool or a profiling context, similar to valgrind, that i can run this under that will give more insight into where/why this amount of potential memory allocation is being requested?
TIA
also, not directly related, i began by trying to find another image similar in dimensions and file size so that i could compare the behavior. it turns out that there is a prime candidate included in the libheif
repo. but the decoder generates an unspecified error for this particular image:
~$ /usr/local/bin/heif-info /tmp/libheif/fuzzing/data/corpus/clusterfuzz-testcase-minimized-file-fuzzer-5752063708495872.heic
MIME type: image/heic
main brand: heic
compatible brands: mi, heic
image: 3024x4032 (id=49)
colorspace: YCbCr, 4:2:0
bit depth: 8
color profile: prof
alpha channel: no
depth channel: no
metadata:
Exif: 1728 bytes
transformations:
angle (ccw): 270
region annotations:
none
properties:
Could not decode image Decoder plugin generated an error: Unspecified
This is just a guess: could it be that when the C++ memory allocator has to alloc some ~10MB blocks, that it is moving the allocated memory boundary (sbrk
) by big steps (couple of 100MBs) to prepare for future allocations? I.e. VSZ increases a lot, but this has no practical effects as the memory space is not used.
Maybe this could be alleviated by allocating the memory in smaller chunks (but that makes the image handling more complex) and it would be optimizing for a particular memory allocator. Still, it might be worthwhile to write a dummy program just to understand how the memory allocator works.
@farindk funny you should say that. i was just running under strace
to see if i could get an idea when the mapping begins. if i am interpreting the strace output correctly (not a given), then sometime after "properties:" has been output, we seem to be mapping a few 12MB and 8MB blocks.
~$ grep -F mmap /tmp/strace-heif-info.txt.13962 | tail -n7
mmap(NULL, 12193792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3c1236000
mmap(NULL, 12193792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3c0695000
mmap(NULL, 12193792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa3bfaf4000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3bf2f3000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3beaf2000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3b62f1000
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fa3be2f1000
of course i may be misinterpreting any of this, and all of it may be a wild goose chase, as it is still early in the morning here :-).
good day.
Also when I examine the output of strace heif-info
, I cannot see a huge unexpected memory use. It's all below 100MB.
well that's interesting. in the strace
output attached above, i appear to have several instances of mmap
with > 100MB, again if i am interpreting the values correctly.
~$ grep -Fh mmap strace-heif-info.* | cut -d ',' -f2 | cut -c2- | sort -n | xargs -r -n1 printf "%'d\n" | tail
27,181,056
27,193,360
67,108,864
67,108,864
67,108,864
134,217,728
134,217,728
134,217,728
134,217,728
134,217,728
and it appears to happen in multiple of the child processes.
~$ grep -Fl 134217728 /tmp/strace-heif-info.txt.*
/tmp/strace-heif-info.txt.13963
/tmp/strace-heif-info.txt.13965
/tmp/strace-heif-info.txt.13966
/tmp/strace-heif-info.txt.13968
/tmp/strace-heif-info.txt.13969
@sherrardb Can you show some context around the lines where those are occurring? At least back to the previous openat
entry?
@bradh the entire contents of the strace
output is in the zip file attached about two posts back.
not sure why i didn't also include the original command, but based on my standard aproach, it was probably something like
~$ strace -ff -s2048 /usr/local/...
I couldn't make much of your logs without timestamps, but have reproduced your test scenario, and I think I see part of the picture. It isn't the mmap of a file, but rather that the thread is getting 134217728
bytes (i.e. 0x8000000) as a memory space.
let me know if you would like me to regenerate the strace
with timestamps, and if so, whether you would prefer -t
, -tt
or -ttt
, or if there are any other options you would like me to include.
let me know if you would like me to regenerate the
strace
with timestamps, and if so, whether you would prefer-t
,-tt
or-ttt
, or if there are any other options you would like me to include.
No, I can reproduce. However it does look vaguely reasonable in terms of address space allocation (i.e. the parent allocates the buffers, and the child address space is rounded up from that. Is allocation of address space (without actual memory usage) actually a problem though?
No, I can reproduce. However it does look vaguely reasonable in terms of address space allocation (i.e. the parent allocates the buffers, and the child address space is rounded up from that. Is allocation of address space (without actual memory usage) actually a problem though?
good question. :-)
there are a few contexts at play here.
if this only ever impacted (1), then i would have likely ignored it. but given the nature of the dynamic image transform code that triggers the issue, if a given source image is in current use, it is likely to impact (2). unfortunately, i don't know enough about how the kernel manages memory to make an informed assessment of how that might impact (3). for example when you have several processes allocating .5-.75GB of RAM apiece, even without actually using it, does the kernel start to (unnecessarily) swap out other processes? does it "steal" RAM that was being used for buffers/cache? is there any measurable systemic impact at all?
unfortunately i don't know. :-(
basically i started down this path because i have monitoring alerts that are triggered when processes seem to be doing Bad Things(tm), and based on how the thresholds are currently configured, the system thought this was indeed a bad thing.
since my daily activities are much more focused on the application-development side of things than devops/sysadmin, can you give me a high-level explanation of how address space rounding can take us from ~50MB of RAM which is actually needed, to essentially 10X that being allocated?
thanks for your time and help thus far
Have you tried to use libheif without multi-threading? Either using heif_context_set_max_decoding_threads(ctx, 0)
or by compiling with cmake -DENABLE_MULTITHREADING_SUPPORT=OFF
.
Does that reduce the allocation?
Note that it will still use multiple threads within libde265, but it should be less.
i have not. i am using libheif indirectly via perl module Imager::File::HEIF, so i suspect that the first suggestion is out completely. but i will recompile using the second suggestion and revert.
thx
disabling multi-threading indeed does have a positive effect. it appears that VIRT is consistently reduced to ~250MB.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30113 sburton 20 0 249.5m 50.1m 14.2m S 93.8 0.6 0:00.78 /usr/local/bin/heif-info /tmp/00600919-CJg-7HLh.heic
can you give me a high-level explanation of how address space rounding can take us from ~50MB of RAM which is actually needed, to essentially 10X that being allocated?
The strace showed 4 threads @ 128 GB (== 0x8000000
), plus the parent address space. So ~5 * 128GB.
Opinion: I still don't get the concern. You seem to be focused on an number that is not important. Nothing in the address space drives swap - it works on pages, not on processes. If you are worried about performance, track performance metrics for your usage. And maybe Perl wouldn't be the first choice...
I'll leave you with it.
@bradh thank you for your time, your patience and your feedback.
i can indeed appreciate that this may be much ado about nothing. based on my (still) rudimentary understanding of the reasons that library would allocate such an inordinate (compared to the image file size or the actual amount required for the decoded image) amount of RAM that it did not intend on using, the only perspective from which i could approach this is "something doesn't look quite right". whether a particular "something" is important to be looking at all is often only determined after investigation.
we do track performance metrics. but sometimes performance issues only present themselves after they have hit a tipping point. and occasionally when you track when the change was made that eventually triggered the problem, you discover that there were other indicators (trends in apparently-unrelated graphs) that may not have seemed important at the time, but that actually presaged the problem. so i assess the data from performance tracking and monitoring tools, through the lens of experience and, for good or bad, apply more than a pinch of intuition.
again, i sincerely appreciate your assistance.
just in case it is of interest to either of you two, or anybody else stumbling across this in the future, after disabling the FFMPEG decoder, the value dropped yet again.
=== Summary of compiled codecs ===
libde265 HEVC decoder : + built-in
FFMPEG HEVC decoder (HW acc) : - disabled
x265 HEVC encoder : + built-in
Kvazaar HEVC encoder : - disabled
AOM AV1 decoder : - not found
AOM AV1 encoder : - not found
Dav1d AV1 decoder : - disabled
SVT AV1 encoder : - disabled
Rav1e AV1 encoder : - disabled
JPEG decoder : - disabled
JPEG encoder : - disabled
OpenJPEG J2K decoder : - disabled
OpenJPEG J2K encoder : - disabled
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5221 sburton 20 0 131.3m 39.8m 5.5m S 106.7 0.5 0:00.65 /usr/local/bin/heif-info /tmp/00600919-CJg-7HLh.heic
@sherrardb How do you disable the decoder?
@sherrardb How do you disable the decoder?
if memory serves correctly, i simply uninstalled the libffmpeg
/libffmpeg-dev
packages on my system. you should be able to confirm success by the output of "Summary of compiled codecs" after running cmake
.
otherwise, if you cannot uninstall these, there may be a way to explicitly disable by inverting the WITH_FFMPEG_DECODER
flag to cmake
, but you will have to check the build docs.
HTH
good day, as mentioned here: https://github.com/tonycoz/imager/issues/511 i am experiencing what appears to be extreme growth (500-750MB) in VSZ when reading in a relatively-small (2MB) file.
library versions:
i have compiled libheif with the following:
for context, i discovered this because i am reading and dynamically manipulating this image within an apache context and this is causing my apache children to consume well more than their usual amount of RAM.
please let me know if there is anything critical that i have omitted.
cheers 00600919-CJg-7HLh.zip