w3c / png

Maintenance of the PNG specification
https://w3c.github.io/png/
Other
46 stars 11 forks source link

Consider making the Adler32 check optional during decompression #40

Open richgel999 opened 3 years ago

richgel999 commented 3 years ago

The adler-32 check is quite slow, and can add a significant amount of CPU time during decompression. This check is not necessary, and some PNG readers (like the one in stb_image.h, which is popular) don't do it at all.

If the compressed data is invalid, the decompressor will fail and stop early or it will return too many bytes. These are easy cases to detect. In our .PNG reader (a variant of lodepng) we have made this check optional.

randy408 commented 3 years ago

According to documentation the reference implementation already has the option to ignore it, setting the chunk checksum to be ignored also skips adler32 checks.

https://github.com/glennrp/libpng/blob/v1.6.37/libpng-manual.txt#L490-L491

This is also optional for spng.

svgeesus commented 3 years ago

I was unaware - so this is more an alignment of the specification with current practice, rather than an enhancement

svgeesus commented 1 year ago

By the way the PNG specification doesn't even call this checksum by name, Adler-32. It just says, in Compression method 0:

Check value | 4 bytes

and

The check value stored at the end of the zlib datastream is calculated on the uncompressed data represented by the datastream. The algorithm used to calculate this is not the same as the CRC calculation used for PNG chunk CRC field values. The zlib check value is useful mainly as a cross-check that the deflate algorithms are implemented correctly. Verifying the individual PNG chunk CRCs provides confidence that the PNG datastream has been transmitted undamaged.

which doesn't seem to require checking it.

I feel we should at least say

The Adler-32 algorithm used to calculate this is not the same as the CRC

and agree we should also allow but not require checking it. Which means an otherwise valid PNG with incorrect Adler-32 is still valid? Or is invalid but easy to error-correct?

fintelia commented 1 year ago

Some existing decoders do reject PNGs with incorrect adler32's, which means that such PNGs cannot be considered valid.

It is also worth pointing out that computing and checking the adler32 checksum doesn't have to be slow. There are SIMD accelerated implementations that reach upwards of 40 GB/s. (I actually measured ~63 GB/s on my machine).

richgel999 commented 1 year ago

Some existing decoders do reject PNGs with incorrect adler32's, which means that such PNGs cannot be considered valid.

It is also worth pointing out that computing and checking the adler32 checksum doesn't have to be slow. There are SIMD accelerated implementations that reach upwards of 40 GB/s. (I actually measured ~63 GB/s on my machine).

Even 40-63 GB/sec. is "slow" relative to other formats which PNG is competing against, which don't have or require this overhead. On very large images even a multi-GB/sec. Adler-32 checksum is time consuming. Having 2 ways of verifying the data (a weak checksum and a CRC) is overkill for most use cases. Having to support 2 checksums also bloats the decoders and makes them even more unnecessarily complex.

The adler-32 checksum itself is also known to be surprisingly weak: https://www.zlib.net/maxino06_fletcher-adler.pdf

"However, based on our studies of undetected error probabilities, the Fletcher checksum is often a better choice. It has a lower computational cost than the Adler checksum and, contrary to popular belief, is also more effective in most situations."

Adler-32 was a bad choice here.

In practice it doesn't matter what the spec says, because implementers will do what they have to do (and what they've been doing for years). They either disable all checksumming entirely, or only use 1 checksum. The spec should respond to the real world, because if it doesn't the real world ignores it. It would be better if the spec gave proper, practical guidance.

ProgramMax commented 1 year ago

Just because some decoders decide to reject the PNG doesn't mean the spec must require them to reject the PNG. It's the other way around.

That said, I do strongly believe in a "street spec". If all encoders/decoders do something that the spec doesn't say, that effectively becomes a spec of its own. So it is good to be mindful of what encoders/decoders are doing.

My personal preference would be for the image format itself not to contain the checksums/CRC. IMHO, the media knows what recovery is best. For example, a HDD might lose a whole sector. Neither Adler-32 or chunk CRCs can recover that. Further, what if there is a bit flip in the chunk name? The CRC protects the chunk contents but not the chunk name. But these are solidly in PNG already. I don't think they can be removed. A potential future compression stream inside PNG might lift the Adler-32 requirement on encoders.

But with all that said, @randy408 and @svgeesus's comments seem to indicate that the decompression step does indeed make the Adler-32 check optional, which is the intent of this issue. Should we close this issue and then open issues against it in PNG libraries?

fintelia commented 1 year ago

To be clear: it sounds like the current spec text as well as the "street spec" is that encoders must write valid checksums, while decoders may ignore the checksums.

ProgramMax commented 1 year ago

Oh, right. I thought you were saying decoders couldn't ignore them. You are right that encoders must write them. I don't see a way we could remove that requirement without breaking a lot of software.

svgeesus commented 1 year ago

To be clear: it sounds like the current spec text as well as the "street spec" is that encoders must write valid checksums, while decoders may ignore the checksums.

Yes. But we should say so explicitly instead of requiring the correct reading between the lines.