pharo-project / pharo

Pharo is a dynamic reflective pure object-oriented language supporting live programming inspired by Smalltalk.
http://pharo.org
Other
1.19k stars 353 forks source link

Enhance `ZnEasy getPng:` to handle some problematic PNGs #12261

Closed chisandrei closed 1 year ago

chisandrei commented 1 year ago

When using ZnEasy class>>getPng: for some PNGs there is an error. For example for the Pharo logo from Twitter (https://pbs.twimg.com/profile_images/541743734/icone-pharo-1.png):

ZnEasy getPng: 'https://pbs.twimg.com/profile_images/541743734/icone-pharo-1.png'.
Screenshot 2023-01-06 at 14 12 41

The underlying issue is that PNGReadWriter cannot parse the picture. We can also save the image to disk and attempt to parse it using PNGReadWriter to get the same error.

'./icone-pharo-1.png' asFileReference binaryReadStreamDo: [ :stream | 
    ImageReadWriter formFromStream: stream ]

Using pngcheck shows that there is some additional data after IEND chunk.

pngcheck -v icone-pharo-1.png            
File: icone-pharo-1.png (559959 bytes)
  chunk IHDR at offset 0x0000c, length 13
    500 x 500 image, 32-bit RGB+alpha, non-interlaced
  chunk bKGD at offset 0x00025, length 6
    red = 0x00ff, green = 0x00ff, blue = 0x00ff
  chunk pHYs at offset 0x00037, length 9: 72x72 pixels/unit (1:1)
  chunk vpAg at offset 0x0004c, length 9
    unknown private, ancillary, safe-to-copy chunk
  chunk IDAT at offset 0x00061, length 32768
    zlib: deflated, 32K window, maximum compression
  chunk IDAT at offset 0x0806d, length 32768
  chunk IDAT at offset 0x10079, length 32768
  chunk IDAT at offset 0x18085, length 14370
  chunk zTXt at offset 0x1b8b3, length 34, keyword: Software
  chunk IEND at offset 0x1b8e1, length 0
  additional data after IEND chunk
ERRORS DETECTED in icone-pharo-1.png

In the PNG specs The IEND chunk must appear LAST. It marks the end of the PNG datastream. The chunk's data field is empty. However it seems that some PNGs can have data afterwards. Not 100% sure this is the case here, but seems to be. Would be more resilient if PNGReadWriter would have a mode to handle these cases.

Tested in Pharo-11.0.0+build.418.sha.d5b3b17a7491bb98ea70539a9198163b938a7d0b (64 Bit)

Ducasse commented 1 year ago

Thanks andrei.

svenvc commented 1 year ago

Nice error report (as usual ;-)

Let's try hacking along your suggestion (that everything after IEND can be ignored, I did not read any specs).

Change:

PNGReadWriter>>#processNextChunk
   ....
   chunkType = 'IEND' ifTrue: [ ^ #IEND "*should* be the last chunk" ].
   ....

and

PNGReadWriter>>#nextImage
    .....
    [ stream atEnd or: [ self processNextChunk = #IEND ] ] whileFalse.    
    .....

and then it works for me.

It is not super clean as before self was returned in all cases, but #processNextChunk is internal, so we can get away with that I think.

All 42 PNGReadWriterTest tests are green (not sure that means much, but still).

What do you think ?

chisandrei commented 1 year ago

For me that looks better than the current version. Not sure how other decoders handle this case but I imagine they ignore what comes after the end. Might also make sense to have a "strict" mode or a way to parse that fails with an error in this case.

From what is mentioned in https://www.w3.org/TR/png IEND marks the end of the PNG.

If I look at the structure of the file, all looks ok, with AE 42 60 82 marking the end of IEND as expected.

Screenshot 2023-01-06 at 17 36 18

But then we seem to get a lot of empty chunks for some reason. No idea if that has any meaning. Seems to happen for many PNGs on Twitter.

Screenshot 2023-01-06 at 17 36 36
Portable Network Graphics (PNG) Specification (Third Edition)
This document describes PNG (Portable Network Graphics), an extensible file format for the lossless, portable, well-compressed storage of static and animated raster images. PNG provides a patent-free replacement for GIF and can also replace many common uses of TIFF. Indexed-colour, greyscale, and truecolour images are supported, plus an optional alpha channel. Sample depths range from 1 to 16 bits.
chisandrei commented 1 year ago

Unrelated, but for more testing of PNG parsing, trying the image at http://www.schaik.com/pngsuite/ might be interesting.

PngSuite - the official set of PNG test images
The PngSuite consists of a two hundred PNG images in all possible formats to support the development of PNG viewing and conversion software
svenvc commented 1 year ago

See https://github.com/DraagrenKirneh/PngSuiteExplorer

we used to score 100% percent IIRC

GitHub
GitHub - DraagrenKirneh/PngSuiteExplorer: Simple Pharo Widget for viewing the result of running the PNG test suite
Simple Pharo Widget for viewing the result of running the PNG test suite - GitHub - DraagrenKirneh/PngSuiteExplorer: Simple Pharo Widget for viewing the result of running the PNG test suite
svenvc commented 1 year ago

In a recent Pharo, you can simply inspect the directory:

(FileLocator documents / 'PngSuite') children.

(FileLocator documents / 'PngSuite') childrenMatching: '*.png'.

(FileLocator documents / 'PngSuite') childrenMatching: 'x*.png'.

Those starting with an x are supposed to give an error. But it is hard to verify that the interpreted images are correct.

svenvc commented 1 year ago

I finally did a PR: https://github.com/pharo-project/pharo/pull/13049

MarcusDenker commented 1 year ago

merged