libvips / pyvips

python binding for libvips using cffi
MIT License
649 stars 50 forks source link

using resource.setrlimit makes pyvips / liborc spit the dummy #449

Closed AntiSol closed 10 months ago

AntiSol commented 10 months ago

Hi @jcupitt, thanks again for libvips and pyvips :)

I'm sorry about the slightly vague bug report and the lack of code to reproduce it. I wasn't able to write code reproduce it reliably (except on my production server with my production code, because of course, ha)

We have been using pyvips to scale down images for a couple of years now. When this was initially developed we found that with very large input images the system could start thrashing and/or become unresponsive due to very high memory usage.

Our solution was to use python's resource.setrlimit to try to limit the memory usage of the process, hoping that it would die somewhat gracefully rather than causing the whole system to thrash or become unresponsive.

This has worked well for >2 years, but In the last week or so, due to reasons I don't understand, we've started seeing errors like this when using pyvips to scale images down:

ORC: ERROR: ../orc/orccodemem.c(302): orc_code_region_allocate_codemem(): Failed to create write and exec mmap regions.  This is probably because SELinux execmem check is enabled (good) and $TMPDIR and $HOME are mounted noexec (bad).
ORC: ERROR: ../orc/orccodemem.c(159): orc_code_region_get_free_chunk(): assertion failed: 0

This suddenly seems to be happening any time we try to compress any image using pyvips, not just for very large images, i.e it's not the normal "I ran out of memory and the process was killed" error I'd expect to see.

Note that selinux is disabled in the container where this is running, making the error message somewhat misleading.

Unfortunately I can't share the code, but it goes something like this:

import resource, pyvips

mem_limit = int(settings.COMPRESSOR_MEM_LIMIT_GB) * (1024 ** 3))
resource.setrlimit(resource.RLIMIT_AS, ,(mem_limit,mem_limit)) # just set hard and soft limits the same)

data = retrieve_image_from_url(url)

image = Image.new_from_buffer(data,options="",access='sequential').autorot()

# nasty ORC error thrown here unless we comment out resource.setrlimit above
image = image.resize(desired_scale,kernel=mode)

jpeg_data = image.jpegsave_buffer(Q=jpeg_quality,strip=True) # strip to remove metadata

# The wise and helpful pyvips author advises here that AVIF is AV1 compression in a HEIF container:
#  https://github.com/libvips/libvips/issues/2324#issuecomment-870490460
avif_data = image.heifsave_buffer(Q=jpeg_quality,compression='av1',strip=True)

save_image_data_as_files_in_cloud(jpeg=jpeg_data,avif=avif_data)

As mentioned, this has recently started happening any time we try to scale an image down using pyvips, regardless of the size of the image.

The thing that is most strange about all this is that as I mentioned this has worked fine for ~2years and has only started throwing this error in the past week or so. I suspect some library has been updated (in our debian bookworm based python 3.9.13 docker container) and is causing this new error.

I suspect that this might not be enough information for you to figure out what's going on or provide useful feedback, but I'm raising a ticket here in the hope that a) you might have some ideas, and b) especially in case somebody else has this problem: I found very few useful matches for that error message when I searched, and I was able to work around it by removing the call to resource.setrlimit (which is not ideal, but better than none of our images compressing!).

If you have any insights or ideas they would be much appreciated. Thanks for your time, sorry I couldn't give a more useful report.

jcupitt commented 10 months ago

Hi @AntiSol,

I have a couple of suggestions:

Move to libhwy

libvips 8.15 has a new SIMD backend based on libhwy, so I'd move to that, I think. It should be a bit faster, and there's no longer any runtime code generation, so you might be able to enable selinux as well.

I wouldn't use setrlimit, it's likely to cause crashes.

Disable or adjust imagemagick

One cause of memory spikes with libvips is imagemagick. If libvips sees a format it does not support (eg. PSB, BMP, ICO, NIF etc.) it falls back to loading via imagemagick. This usually gets you an image, but it can cause huge memory spikes. For example:

$ /usr/bin/time -f %M:%e vipsheader bigphoto.psb 
bigphoto.psb: 23847x12799 uchar, 3 bands, srgb, magickload
3325952:4.51
$

So 3.3gb and 4.5s just to get the header of a PSB file. It's 10s to make a thumbnail.

I would suggest:

Block untrusted loaders

We fuzz libvips for security, but only for a subset of the supported loaders. Some loaders are not designed for untrusted data and can be trivially exploited. They should never be exposed to uploads.

Sadly, many libvips binaries (eg. Debian, RHEL, etc.) ship with all of these bad loaders enabled. Since 8.13 we've had a feature to disable them at runtime:

https://www.libvips.org/2022/05/28/What's-new-in-8.13.html

So I would set the VIPS_BLOCK_UNTRUSTED env var and only use the tested loaders.

Interlaced (or progressive) images

PNG and JPG both support interlaced (also called progressive) images -- these are the ones that appear at a low res first, then slowly fill in detail as it is downloaded.

The downside is that you don't get the final pixels until the whole image is in memory, which prevents any streaming processing, and hugely increases memory use. For example:

$ /usr/bin/time -f %M:%e vipsthumbnail big-progressive.jpg 
3732224:4.23
$ vips copy big-progressive.jpg x.jpg
$ /usr/bin/time -f %M:%e vipsthumbnail x.jpg 
72448:0.26
$ 

So this progressive jpeg takes 4gb of memory and 4.3s to thumbnail, but exactly the same image as a regular jpeg takes 72mb and 0.26s.

I would detect these horrors before processing and either ban them, or if your users insist on uploading in this terrible format, push them to a separate low-priority queue on a special container. Keep them away from your main image path!

Other

AntiSol commented 10 months ago

Thanks so much for your thoughtful response @jcupitt I've had other priorities the least few days but I totally will get back to this before long. I may or may not respond more fully then.

Your suggestions are super helpful and informative. I've actually wanted to build a separate container for more troublesome images for a while, but time is never on my side.

I think I'll probably try building the latest release to take advantage of libhwy as suggested, and likely also implement a bunch of your other recommendations (like using VIPS_BLOCK_UNTRUSTED), too.

I'll close this for now.

Thanks again!