New filter method, optimised for 10,12,16 bits/component image data?

svgeesus commented 4 months ago

This issue is to potentially resurrect an idea I had about filtering 16 bit per component images back at the dawn of PNG, but didn't investigate (and dropped, because an MVP of PNG that actually got implemented was more important). At the time, working in a computer graphics lab, we were making a bunch of 16 bit/component images, which were stored as TIFF, and we wanted to use PNG to migrate away from no-longer-free LZW compressed TIFF. We got that, but the sub-optimal filtering bugged me

The existing filter methods in PNG are all oriented towards 8bit/component greyscale and truecolor images. They are defined on bytes, and the spec says to use filter 0 (None) for images with less than 8bits/component.

For images of color type 3 (indexed-color), filter type 0 (None) is usually the most effective. Color images with 256 or fewer colors should almost always be stored in indexed-color format; truecolor format is likely to be much larger.

Filter type 0 is also recommended for images of bit depths less than 8. For low-bit-depth greyscale images, in rare cases, better compression may be obtained by first expanding the image to 8-bit representation and then applying filtering.

For truecolor and greyscale images, any of the five filters may prove the most effective. If an encoder uses a fixed filter, the Paeth filter type is most likely to be the best. 12.7 Filter selection

So they work, but not very well, on 16bit/component data; typically filter type 0 (None) is the most effective even though they are truecolor. This is because 16bit/component data is stored as interleaved MSB, LSB which kills compression efficiency.

I wondered at the time about another filter method, with one or more filter types, which firstly, de-interleaves the image data: on each scanline to be filtered, you get an 8-bit MSB scanline and an 8-bit LSB scanline. The existing filters, or new ones, are then used on the de-interleaved data before it is compressed.

Decompression works the same way, and undoing the filtering re-interleaves the image data.

Adding a new filter method is not backwards compatible with existing decoders. Adding a new filter type for the existing filter method 0 is also not allowed

So the filesize benefits would need to be significant, compared to:

using a different image format with better lossless compression
using unfiltered 16bit data
using unfiltered 16bit data and relying on sBIT to say the depth is 10, or 12
using a backwards-compatible but hacky way to redefine existing data, like pretending 10-10-10-2 is actually 8-8-8-8

So this might not work; or might work but not well enough to justify doing. At least worth exploring, though.

jbowler commented 2 months ago

I investigated the optimum filtering for a range of PNG images I had spidered off the web many years ago. The conclusion was that the spec recommendations and the actual libpng implementation (pre libpng 1.7) mostly sucked; a whole load of extra time and runtime memory was required and for very little gain. Simplistic direct selections based on the data properties worked as well; the spec recommendations reflect this but the choices were not, I feel, well researched.

For 16-bit data the core of the problem is that the handling of the LSB independently of the MSB can result in nothing but noise. That noise is interleaved with the potentially non-noisy MSB filter and both bytes must use the same filter. Hence the "just don't try" recommendation, just like colortype 3 data of any pixel depth.

For <16-bit, >8-bit data using sBIT and scaling to 16-bits by a left shift reduces the LSB noise massively at the potential cost of incorrect display by PNG implementations which don't honour sBIT (most of them other than recent versions of libpng?) Somewhat ironically the v3 support for 10-bit "narrow-range" HL encoding mandates a left shift when scaling to 16 bits, which is a significant issue for libpng support.

All of these things are attempts to swim while hogtied.

"Fixing" the filtering is an obvious solution and the simplest fix is to filter 16-bit channels as 16-bit values. Then all the filters are, in fact, dealing with numbers which just might be correlated across space. Might. The change is almost trivial; it's just the same filters with 16-bit math rather than 8-bit math. However the result will be byte-compressed by zlib (deflate) and even though there will be stronger correlations between adjacent bytes (better 2-byte string matches than before) it's still going to be worse than the corresponding 8-bit filtering and that isn't very good. E.g. how much does 8-bit RGB filtering improve the zlib byte compression compared with the native zlib ability to recognise an RGB8 as a three byte string? My tests suggest, "Not much if at all."

Such changes will prevent decoding by all current decoders without exception. So why not change the color type? Why do we use 16-bit anyway? What on earth is the point of storing 48-bits of perceptual colour information, let along 16 of alpha? There is absolutely none because, as I have pointed out before on png-mng-implement, the full dynamic range required (not just "high", all the dynamic range) will fit along with space to avoid error propagation in 28 bits and the alpha channel (as established almost 40 years ago by possibly unpublished experiments on text "anti aliasing") requires no more than "six" "grey" levels (3 bits of alpha).

The color type change I originally suggested was based on L(Y)L(R/Y)L(G/Y), which does work but is non-optimal. L means logarithm to some base less than 1.01 chosen to allow extra room to avoid error propagation on decode/encode cycles; this is not a "final image format" (as in all current uses of PNG.) I assumed 4000 distinguishable colors and allowed 7 bits for error and non-linearity. I assumed the full range of human vision; not just photopic. Later I allowed for storage of illuminant levells based on the maximum brightness of the sun. The upper levels aren't perceivable by a human (something to do with retina overload) but are essential for 3D rendering where a lightdome can be used to illuminate a scene (illumination levels) was well as providing the background (human levels).

This is a true absolute representation; more absolute than the current proposal for PQ (sp?) support which is, insofar as I understand it, absolute in the display space not the capture/render/real-world space.

So, yes, it's a new colortype and it is incompatible but it builds on that to actually make a format which is compatible with PNG parsers but is an "original scene" representation as opposed to a "final format" or "display environment" representation. The data requires extensive post-processing; indeed if it is a lightdome the only reasonable use is for a lightdome in a 3D render!

After that I lost interest in minor incompatible tweaks; one of the reasons I don't like the support for narrow-range images in the current proposal. Yes, sure, that is "backward compatible" because cICP can be ignored, but if cICP isn't ignored how can negative values for R,G,B be regarded as backward compatible?

On the format I'm currently moving toward a lot of my initial choices were suboptimal given that PNG is about lossless compression. Why use L(R/Y), L(B/Y) when the CIE has already spent many years of research to come up with perceptually linear color "difference" encodings in CIELUV and CIEYUV? Perceptually linear encodings are the optimal lossless encodings of perceptual data. I avoided CIEXYZ because using it to encode images was patented, but maybe that's more than 17 years ago now :-) CIE chromaticities, as used by the cHRM chunk therefore well established in PNG, aren't perceptually linear but an encoding which uses them spans the gamut (so no out-of-gamut colors; Chris, this is hard for me to understand so shoot me down if I have this wrong). That encoding is trivial. An optimal encoding of CIEYUV is, on the other hand, pretty damn difficult because a gamut-spanning triangle is not axis aligned; it's certainly possible but it's complex.

So I'm thinking if I wait long enough (for the patent to expire) I can do xyY encoding in no more than 28 bits and full xyYA in 32 bits.

Ok... now the compression? Why stop when you're winning? LZ77 is great, no patents. LZW is no longer patented; I did an implementation from scratch in the previous millennium so any patent on that should be long gone. LZ77 is 8-bit, the classic implementation of LZW allows any bit length. I have four channels; 14-bit Y, 7-bit x and y and 4-bit alpha. Give me four separate LZW streams three of which require just a 4096 LUT with 32-bit entries (IRC, it's been a while). The 14-bit channel, hum, now how about filtering that; I would have a critical chunk which, if present, gave extra instructions for decoding the 14-bit channel. The alpha channel would have run-length filtering (it's for edges, the dudes who want to use it for inappropriate transparency can use 64-bit PNG RGBA).

Always the message is a single incompatible change creates incompatibility, so never do it for a minor reason and when it's done do it properly. Three incompatible changes to color type and filtering and compression are three times as complex to implement but offer 8 times the opportunity for better compression.

ProgramMax commented 2 months ago

I agree that there is room to improve and time to do it now. I regret dropping the ball on making a suite of public domain test images. Those would be useful here. I guess spinning that back up could be part of this.

We probably also need to have a discussion about PNG updates vs. PNG2 (better name suggestions welcome). Anything that would significantly break existing PNGs should perhaps instead go into PNG2. The trade off is "old stuff broke" vs. "new adoption is extremely hard, so it was in vain." If the goal is to improve the world('s data situation), both make a strong case.

fintelia commented 2 months ago

The challenge of a PNG2 is that it wouldn't just be competing with the original PNG format, but also all the subsequent lossless formats. Lossless WebP has 14 different filters and the ability to de-correlate channels, a number of formats use CABAC to get better entropy coding, etc.

ProgramMax commented 2 months ago

I think that's okay. There probably should be multiple solutions. Even if the initial release of the hypothetical PNG2 is just a big catch-up and not yet fully competitive, I think it would be a good thing to do.

jbowler commented 2 months ago

The history of ISO-JPEG is salutary. The JPEG did exactly the same thing; they chose a filter method (the Discrete Cosine Transformation) which was determined by a lot of factors but known at the point where the initial JPEG standard was fixed to be imperfect. If nothing else read Pennebaker & Mitchell's comments in the cheap version of the JPEG standard (their book), chapter 21 plus somewhere there is a discussion of better (though non-standard) ways of handling the DC component.

As the book says still image compression was, at the time, a moving target (not so much these days). After 10 years or so the target had moved enough for a new filter method based on wavelets to be considered and standardised. They called the first version JPEG and the next version JPEG2, well, x1000. So now, if I have a JPEG2000 image it's just JPEG right? Everything understands it because it is JPEG not!

PNG really is just a container format. It was set up so this just works; in this case even if the 8 byte "signature" at the start is the same the five control bytes that are part of the first 33 bytes say up front what has changed. I'd still change the signature but the code I write these days doesn't give a damn; it checks the control bytes too.

PNG also allows for 'critical' chunks; these can introduce new features which will make the entire file unreadable unless the chunk is understood. I regard this as a supplement to making changes in the IHDR. The safest approach is to change the signature, the next best is to change the IHDR length but certainly adding filter methods is completely permitted; there is provision for "private" extensions using a top-bit-set byte and it was always intended that this allow for experimentation without having to rewrite stuff like parsing code etc.

ProgramMax commented 2 months ago

EDIT: I thought the reply was to a different thread. I lost context in that. As a result, my comment was very off-topic. Given the correct context, my comment seems ramble-y and wild.

So I removed that nonsense and replaced it with this text explaining the removal.

jbowler commented 2 months ago

As a result, my comment was very off-topic.

Well, it did cause me to discover a bug in the current v3 spec :-)

w3c / PNG-spec

New filter method, optimised for 10,12,16 bits/component image data? #426