python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
12.17k stars 2.22k forks source link

Saving a jpeg throws an "encoder error -2 when writing image file" for some images #5448

Closed lazakoa closed 1 year ago

lazakoa commented 3 years ago

We have a jpeg in our Wagtail app for which we can't generate an image rendition. An OSError is thrown from the _save method in ImageFile.py. We isolated it the cause down to the buffer size, our images have icc profiles and when we remove them the image rendtion can be saved. Unfortunately we weren't able to extract the exact image that is being passed to Pillow as the framework performs it's own transformations on it. We can reproduce it reliably in the context of our app but we can't reproduce using purely Pillow.

Here the authors have a heuristics for the image buffer, which basically means how much RAM to reserve for the JPEG. They assume that the largest JPEG for the CMYK case is 4 bytes for every pixel (one byte for every channel), so 4 width height. This is correct.

    # if we optimize, libjpeg needs a buffer big enough to hold the whole image
    # in a shot. Guessing on the size, at im.size bytes. (raw pixel size is
    # channels*size, this is a value that's been used in a django patch.
    # https://github.com/matthewwithanm/django-imagekit/issues/50
    bufsize = 0
    if optimize or progressive:
        # CMYK can be bigger
        if im.mode == "CMYK":
            bufsize = 4 * im.size[0] * im.size[1]
        # keep sets quality to -1, but the actual value may be high.
        elif quality >= 95 or quality == -1:
            bufsize = 2 * im.size[0] * im.size[1]
        else:
            bufsize = im.size[0] * im.size[1]

Arguably, for non-cmyk images this is potentially too low, it should be 3 width height for RGB or 4 width height for RGBA. But I digress. The smallest image rendition we have is width-360 Keeping the aspect ratio this comes to 360 409 4, so bufsize = 588960 Now this magic line:

    # The EXIF info needs to be written as one block, + APP1, + one spare byte.
    # Ensure that our buffer is big enough. Same with the icc_profile block.
    bufsize = max(ImageFile.MAXBLOCK, bufsize, len(exif) + 5, len(extra) + 1)

They take the maximum of the MAXBLOCK, or bufsize, or the exif data or the extra (ICC) data. So in this case the bufsize remains at 588960. This means that the compressed JPEG data must fit into 588960 bytes. However, the ICC data in the image is appended uncompressed and that is 557168 bytes. So for the compressed JPEG data only 31792 bytes are available. And that's not enough in this case. All the other image renditions succeed because their width/height is larger and thus there is enough space for the compressed JPEG + the ICC profile.

This can be fixed by adding:

   # The EXIF info needs to be written as one block, + APP1, + one spare byte.
    # Ensure that our buffer is big enough. Same with the icc_profile block.

    if len(exif):
        bufsize += len(exif) + 5
    if len(extra):
        bufsize += len(extra) + 1

    bufsize = max(ImageFile.MAXBLOCK, bufsize)

    ImageFile._save(im, fp, [("jpeg", (0, 0) + im.size, 0, rawmode)], bufsize)
radarhere commented 3 years ago

Unfortunately we weren't able to extract the exact image that is being passed to Pillow as the framework performs it's own transformations on it.

So you don't have access to the file from your code. Are you interested in modifying your Pillow installation though, to write to file the image that is passed to it?

lazakoa commented 3 years ago

We already forked Pillow with our fix. I don't think I will be permitted to spend anymore time on this. I tried dumping the Image object with pickle in a attempt to reproduce the error in a controlled manner but that didn't work. We wouldn't be able to share any version of this image anyways, company rules (this may change in the future).

radarhere commented 1 year ago

From your description, you have a CMYK image that is 360 by 409, with 557168 bytes of ICC profile data. Given the code you're referencing, you're trying to save it either using "optimize" or "progressive" (or "progression", but that's the same).

However, the following code doesn't raise an encoder error for me, either with main or with 8.2.0.

from PIL import Image
im = Image.new("CMYK", (360, 409))
im.save("out.jpg", optimize=True, icc_profile=b" "*557168)
im.save("out.jpg", progressive=True, icc_profile=b" "*557168)

You have said that you were using Debian. I tested this on Debian 10 with libjpeg-turbo 1.5.2, but still couldn't produce an error.

Closing, unless someone can demonstrate a way to replicate this.

pks40 commented 1 year ago

I can reproduce the error with the attached image (which is simply a downscaled version of this image by Flickr user Ryan Gsell, CC BY 2.0). To reproduce, rename the image file to test.jpg and run the following:

from PIL import Image
im = Image.open("test.jpg")
im.save("out1.jpg", quality=80, progressive=True, icc_profile=b" "*55274)
im.save("out2.jpg", quality=80, progressive=True, icc_profile=b" "*55275)

For me, the first call of save() works fine; the second throws the "encoder error -2 when writing image file" error.

I'm using Python 3.11.2 with Pillow 9.4.0 on Debian stable (i.e. bookworm). test

radarhere commented 1 year ago

When I run your code, I also get the error "Suspension not allowed here".

Googling, I found https://intellipaat.com/community/16184/how-to-save-progressive-jpeg-using-python-pil-1-1-7, which had an interesting suggestion - increasing the MAXBLOCK size.

The following code runs without a problem.

from PIL import Image, ImageFile
ImageFile.MAXBLOCK = 67501
im = Image.open("test.jpg")
im.save("out1.jpg", quality=80, progressive=True, icc_profile=b" "*55274)
im.save("out2.jpg", quality=80, progressive=True, icc_profile=b" "*55275)

Also, I expect you're aware that your problem only happens when progressive=True.

radarhere commented 1 year ago

Thanks very much for the reproduction, but isn't this just a simple case of hitting the MAXBLOCK limit? Sure, we could increase it dynamically like the original poster suggested, but wouldn't that defeat the purpose of having a MAXBLOCK limit in the first place?

pks40 commented 1 year ago

Also, I expect you're aware that your problem only happens when progressive=True.

Yeah, sure, this only happens if optimize or progressive is true.

Thanks very much for the reproduction, but isn't this just a simple case of hitting the MAXBLOCK limit? Sure, we could increase it dynamically like the original poster suggested, but wouldn't that defeat the purpose of having a MAXBLOCK limit in the first place?

I'm not really sure about this. It's a kind of special case, fair enough; but still a case that can occur in practice and therefore should be dealt with. I'm also not sure (simply because I'm not in any way used to Pillow's internals) if it should be viewed as an issue with the MAXBLOCK limit; to me, it seems that it's more related to setting bufsize in the progressive or optimized case. Therefore, in my opinion what should really happen in the code is that if optimize or progressive is True, then the length of the additional data (i.e. len(exif) + 5 + len(extra) + 1) is added to the (estimated) bufsize:

    # if we optimize, libjpeg needs a buffer big enough to hold the whole image
    # in a shot. Guessing on the size, at im.size bytes. (raw pixel size is
    # channels*size, this is a value that's been used in a django patch.
    # https://github.com/matthewwithanm/django-imagekit/issues/50
    bufsize = 0
    if optimize or progressive:
        # CMYK can be bigger
        if im.mode == "CMYK":
            bufsize = 4 * im.size[0] * im.size[1]
        # keep sets quality to -1, but the actual value may be high.
        elif quality >= 95 or quality == -1:
            bufsize = 2 * im.size[0] * im.size[1]
        else:
            bufsize = im.size[0] * im.size[1]
        bufsize += len(exif) + 5 + len(extra) + 1 # this line added by me

I mean … it's kinda weird to have Pillow suddenly raise an error when writing (progressive and/or optimised) images that are ‘too small’ in size but still contain large amounts of metadata; even more so since the error is not really understandable when one encounters it (yesterday when hitting this, it really confused me). In my opinion, this should be viewed as a bug; Pillow should support writing optimised JPEGs with big ICC profiles without raising an error (and without the user having to somehow hack around this issue).

radarhere commented 1 year ago

I've created PR #7345 to resolve this.