python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
11.94k stars 2.2k forks source link

cannot identify image file (PNG file from scanner) #7993

Closed OmlineEditor closed 3 weeks ago

OmlineEditor commented 3 months ago

What did you do?

img = Image.open(path) # I open a file in a script

What did you expect to happen?

script open file with script will continue execution

What actually happened?

an error occurs:

 File "/usr/local/lib/python3.9/dist-packages/PIL/Image.py", line 3339, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file '/var/www/python_for_site/bug.png'

What are your OS, Python and Pillow versions?

Please paste here the output of running:
> python3 -m PIL.report
--------------------------------------------------------------------
Pillow 10.3.0
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
       [GCC 10.2.1 20210110]
--------------------------------------------------------------------
Python executable is /usr/bin/python3
System Python files loaded from /usr
--------------------------------------------------------------------
Python Pillow modules loaded from /usr/local/lib/python3.9/dist-packages/PIL
Binary Pillow modules loaded from /usr/local/lib/python3.9/dist-packages/PIL
--------------------------------------------------------------------
--- PIL CORE support ok, compiled for 10.3.0
*** TKINTER support not installed
--- FREETYPE2 support ok, loaded 2.13.2
--- LITTLECMS2 support ok, loaded 2.16
--- WEBP support ok, loaded 1.3.2
--- WEBP Transparency support ok
--- WEBPMUX support ok
--- WEBP Animation support ok
--- JPEG support ok, compiled for libjpeg-turbo 3.0.2
--- OPENJPEG (JPEG2000) support ok, loaded 2.5.2
--- ZLIB (PNG/ZIP) support ok, loaded 1.2.11
--- LIBTIFF support ok, loaded 4.6.0
--- RAQM (Bidirectional Text) support ok, loaded 0.10.1, fribidi 1.0.8, harfbuzz 8.4.0
*** LIBIMAGEQUANT (Quantization method) support not installed
--- XCB (X protocol) support ok
--------------------------------------------------------------------

My Code:

from PIL import Image

image_path = "bug.png"
image = Image.open(image_path)
width, height = image.size

print("width:", width, "px")
print("height:", height, "px")

I worked a lot with files, but I can’t open this file even though it’s normal. I can’t open a single file that I scan on a scanner in PNG format.

cannot_identify_image_file.zip

radarhere commented 3 months ago

If I run pngcheck over your image, I get

CRC error in chunk pHYs (computed eee74573, expected c76fa864)

To skip the check in Pillow, use

from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

image_path = "bug.png"
image = Image.open(image_path)
aclark4life commented 3 months ago

Same issue with convert, although macOS Preview opens it.

% convert bug.png bug.png
convert: pHYs: CRC error `bug.png' @ warning/png.c/MagickPNGWarningHandler/1526.

Actually, convert fixes it:

% convert bug.png bug.png
convert: pHYs: CRC error `bug.png' @ warning/png.c/MagickPNGWarningHandler/1526.
% pngcheck bug.png       
OK: bug.png (579x864, 24-bit RGB, non-interlaced, 57.7%).
% convert bug.png bug.png 
%
OmlineEditor commented 3 months ago

ImageFile.LOAD_TRUNCATED_IMAGES = True

This code helps solve the issue, but it's crucial to ensure there won't be any issues when processing the image further. Could this code affect the functionality, potentially causing problems down the line?

radarhere commented 3 months ago

Apart from skipping some checks with PNGs, the other behaviour of LOAD_TRUNCATED_IMAGES is to try and load images that end prematurely.

The internal Pillow data will not be in a corrupted state, no, all operations on the loaded image will be as valid as they ever were. This is just ignoring the fact that the pixels being read from the image are perhaps not what they are supposed to be.

OmlineEditor commented 3 months ago

Okay, thanks for the help. The problem in the scanner that cannot correctly calculate the control amount for the file. You can make changes to the code so that there is no error and the message was shown - the file is damaged and has not the right CRC? If there is a message about the CRC problem, and not the error will be better and more understandable then.

radarhere commented 3 months ago

You're requesting that we only raise a warning in this situation?

If the image is corrupted or ends prematurely, I think we both agree that users should know there is something wrong. Whether the user would want to continue using a flawed image anyway is a matter of personal preference, and so there is a setting for it. I'd like there to be a stronger argument before changing Pillow's default setting.

The meaning behind UnidentifiedImageError is documented, specifically mentioning this PNG behaviour - https://pillow.readthedocs.io/en/stable/PIL.html#PIL.UnidentifiedImageError

As some background, the error behaviour has been here since the fork from PIL. It was only #1991 that allowed LOAD_TRUNCATED_IMAGES to workaround it.

You might be interested to know that

from PIL import PngImagePlugin
PngImagePlugin.PngImageFile("bug.png")

will show you the SyntaxError directly.

Traceback (most recent call last):
  File "demo.py", line 6, in <module>
    PngImagePlugin.PngImageFile("bug.png")
  File "PIL/ImageFile.py", line 137, in __init__
    self._open()
  File "PIL/PngImagePlugin.py", line 733, in _open
    self.png.crc(cid, s)
  File "PIL/PngImagePlugin.py", line 209, in crc
    raise SyntaxError(msg)
SyntaxError: broken PNG file (bad header checksum in b'pHYs')
aclark4life commented 3 months ago

Agree this is an error and we're not going to change to warning. Also super-interesting that the PngImagePlugin raises SyntaxError and reveals the bad checksum. The only change I'd consider making here is to add an option similar to LOAD_TRUNCATED_IMAGES to enable more verbose output from Pillow when the image plugin fails to return an open image to ImagePlugin._open. Not sure what that would look like or if there are any existing verbose options in Pillow, but something like --show-me-what-really-happened.

OmlineEditor commented 3 months ago

Okay, let it show an error, but not just “cannot identify image file”. Let there be a more detailed and understandable error, just change only the text of the error message to: “cannot identify image file, the file is damaged, the file has an incorrect CRC signature

radarhere commented 3 months ago

That's not as easy as it sounds.

By default, Pillow checks your image against multiple formats. Some formats can be easily rejected because your image data does not start with the required identifier, but not all.

So if I adjust Pillow to print out the errors raised by any formats against your image

diff --git a/src/PIL/Image.py b/src/PIL/Image.py
index c65cf3850..ab41f525f 100644
--- a/src/PIL/Image.py
+++ b/src/PIL/Image.py
@@ -3333,10 +3333,11 @@ def open(
                     im = factory(fp, filename)
                     _decompression_bomb_check(im.size)
                     return im
-            except (SyntaxError, IndexError, TypeError, struct.error):
+            except (SyntaxError, IndexError, TypeError, struct.error) as e:
                 # Leave disabled by default, spams the logs with image
                 # opening failures that are entirely expected.
                 # logger.debug("", exc_info=True)
+                print(i+": "+str(e))
                 continue
             except BaseException:
                 if exclusive_fp:

I get

PNG: broken PNG file (bad header checksum in b'pHYs')
IM: Syntax error in IM header: �PNG
IMT: not identified by this driver
IPTC: invalid IPTC/NAA file
MPEG: not an MPEG file
PCD: not a PCD file
SPIDER: not a valid Spider file
TGA: not a TGA file

I imagine you don't want to see all of that.

OmlineEditor commented 2 months ago

I imagine you don't want to see all of that.

This is how it became clearer, let there be more messages to understand where the error is and how to fix it.

Yay295 commented 2 months ago

Those messages would show even if the image opened successfully, because all of the other attempted formats would print their failures.

aclark4life commented 2 months ago

I imagine you don't want to see all of that.

I think I'd like to be able to say Image.verbose = True and see all that, but I expect that also may not be as easy as it sounds to implement.

Yay295 commented 2 months ago

It looks like warnings are added to a list that gets shown at the end if the image can't be opened. Exception messages could probably be treated similarly.

radarhere commented 2 months ago

I've created https://github.com/python-pillow/Pillow/pull/8033 to allow Image.open("bug.png", warn_possible_formats=True) to show the various exceptions as warnings, but only if the image is not able to be opened successfully. See what you think.

hugovk commented 2 months ago

I'm not sure about the scalability of adding Boolean flags here and there.

How about adding it to a logger?

radarhere commented 2 months ago

I feel the concern about scalability, but as for a logger, as @nulano pointed out, this is something that previously existed, but was removed in #1423.

https://github.com/python-pillow/Pillow/blob/ddbf08fa78a1aeac83c8295b071f15c722615326/src/PIL/Image.py#L3343-L3347

I am cautious about making decisions and then undoing them. @wiredfool, as the author of #1423, do you have any thoughts on this?

aclark4life commented 2 months ago

This is a "nice to have" so I wouldn't add anything for logging or to increase verbose output unless "no other way forward". In this case, it's unfortunate to not get the appropriate information right away, but certainly not critical for us to fix it.

nulano commented 2 months ago

While I'm not sure we should do either of these, I have thought of two options:

aclark4life commented 2 months ago
  • Add a global setting (similar to MAX_IMAGE_PIXELS) - I agree that a new function parameter for debugging is not very scalable, but a global setting (perhaps even reused from other functions) would not complicate the interface too much.

Right, global setting is what I suggested here too.

Append all detected issues to the raised UnidentifiedImageError

If you append based on the global setting, probably OK. If not, probably not.

radarhere commented 2 months ago

I've created #8063 with Image.WARN_POSSIBLE_FORMATS