Closed ProgramMax closed 1 year ago
Maybe we could treat it like a standard 8-bit/channel PNG for backwards compatibility.
For non-palette images, a new chunk could say "Actually, we're 10-bit/channel. Here are the extra 2 low-order bits in a new IDAT-like chunk."
Because they would be low-order, they only add detail. So an older decoder would get as close as it can.
One drawback I foresee is--being the low-order bits where the fine details exists--I imagine this data won't compress well. The filter (which normally benefits from local similarity) may not be very helpful, either.
For palette images, there could be a new PLTE-like chunk that uses the same indices and adds the extra bits.
Rather than trying to change PNG in a way that is clearly not-backwards compatible, why not use this as an opportunity to move the web community to modern raster image formats - that does support 10 bits per channel - such as AVIF and JPEG-XL?
Stating how many bits are significant (when padded to the next-highest multiple of two) is the job of the sBIT chunk.
The question is not, how big is a raw image at 10bit rather than 16, but how big is the compressed image once a) filtered and b) zlib compressed.
I imagine (but have not verified experimentally, which would be an interesting result to see) that applying the existing filters and compression onto packed 10bit data would not give better results than applying them to 16bit data with 4 zeroed low-order bits
Leonard, I definitely don't want to break backwards compatibility. If that had to happen, I would consider nudging toward other formats. But I think we can continue without breaking backwards compatibility.
Chris, I think you're right that sBIT is the way to handle this. And that 16bit's extra zeroes likely compress quite well.
Would it be worthwhile to have a special callout to implementers of encoders/decoders? I had actually read the sBIT chunk doc you linked while considering this. But I still skipped over it, having not fully understood it's purpose. For example, as a decoder that wants to use a 10bit texture format, I need to check the IHDR and the sBIT. I would be willing to bet money that most implementers look only at the IHDR and stick with a format which matches it.
Thinking out pros and cons:
Using the sBIT chunk, we scale colors to be lighter. So whites remain white, but blacks become gray. Decoders that understand the sBIT chunk will correct for this. Decoders that don't will display the awkwardly brighter image.
Using a new, extra chunk that says "You already have 8 bits. Here are 2 more" will be handled better by decoders that don't understand the chunk. They will display black blacks and white whites. The difference would literally be the extra bits of precision: IE 0/255 red vs. 1/1024 red. It'll be as close as it can get to the actual image.
So is it worth it? I'm not sure. Adding new chunks (especially these) would be awkward. I'm not sure how many decoders already understand the sBIT chunk. And I don't know how many apps map 10-bit sBIT to a 10-bit texture.
Right now, I think the sBIT option is better. What would change my mind is if I learn many decoders don't understand the sBIT chunk and we end up with a lot of 10-bit images being used. If we anticipate that to be the case, I think the awkward new chunks is the better option.
2021-08-02: add note/warning that sBIT chunk is expected to be used because 10-bit color channels are regularly used with HDR imagery. The note should also cover 12-bit color channels. Add recommendation on how the encoder should fill the unused bits.
Both 10 and 16 bit PNGs compress very inefficiently. Every second byte is about the least significant bits and every second about the most significant bits. These have very different probability distributions and thus entropy codings. Unfortunately PNG is not able to context model this.
A decent solution would be to use WebP lossless twice, once for the higher 8 bits and another image for the lower bits, or just use JPEG XL's lossless coding. Either of these will be 2-3x less bytes than using PNG for HDR.
I didn't fully understand what you are saying.
Does WebP have one compression stream of the high bytes and another compression stream for the low bytes?
have you considered storing 10bits ab cdef ghij
in 16bit container as abcd efgh ij00 0000
? Another option is abcd efgh ij01 1111
which may be more compatible with 16-bit workflows that aren't enabled for or aware that the data is actually 10bit. I've seen a similar process work well for digital cinema DCDM 12bit image data stored in 16bits of TIF as abcd efgh ijkl 0111
as described in SMPTE RP 428-5 Section 4.3 [1]
[1] https://ieeexplore.ieee.org/document/7291227
@svgeesus's first comment about using the sBIT chunk is exactly "Storing the 10 bits in a 16 bit container". It lets you use a 16 bit PNG but specify that you're only really using 10 of those 16 bits.
Filling the spare bits with the 0111... pattern is clever :D
Often we use abcd efgh ijab cdef -- that way we can reach both 0000 0000 0000 0000 and 1111 1111 1111 1111. Being able to express black and white the same regardless of the bit depth interpretation helps in interoperability and simplifies getting things right.
I didn't fully understand what you are saying.
Does WebP have one compression stream of the high bytes and another compression stream for the low bytes?
Not yet. We would need to add that to the spec and to the decoder. It would be technically miraculously simple to add in both. The only minor difficulty is in integration to HDR, but that problem exist with any solution.
JPEG XL has the support in the spec already as well as a working demo using the ColorWeb-CG ideas. https://eustas.github.io/jxl-demo/index.html?colorSpace=rec2100-hlg&img=2 is a WASM demo and using it requires the Chrome's experimental HDR canvas flag to be enabled.
As detailed above, PNG today supports packing 10 bit words into 16-bit words (using the sBIT
chunk).
More efficient coding of 10-bit (and 16-bit) words requires a different coding technique, but that is a totally different (and more complex) issue IMHO.
....More efficient coding of 10-bit (and 16-bit) words requires a different coding technique, but that is a totally different (and more complex) issue IMHO....
When I stumbled onto this thread, first thought was the existing three channels of 10bit into four bytes format of DPX, but I don't think that fits well into png.
But if there is a desire to conserve bandwidth, it occurred to me that the color type 6 8bit rgba png, might easilly be modified so the two LSbits of each 10bit channel are mapped onto the 6 LSbs of the alpha channel, and the 2 MSbs of the alpha could still be used for a one or two bit alpha.
A two bit alpha could be combined with the tRNS
chunk to have 4 indexed transparency values (though that may cause compatibility issues), otherwise, 0b00
= 0% opaque, 0b01
= 33%, 0b10
= 66%, 0b11
= 100% opaque.
The next question is, is there a configuration where a decoder/viewer that was not capable of handling this segmented10bit
format, and just discarded the bits in the alpha as if fully opaque? In this case, the LSbs would be truncated, and while truncation is a poor way to handle down sampling, and does have artifacts, the image would still be reasonably viewable.
The sBIT
chunk provides the way to make this happen, to mask the LSbs in the alpha and also show the 2 MSbs of transparency, so a 10 bit image could display with a reasonable fallback in a naïve/legacy viewer.
Setting sBIT
to 8 8 8 2
, then current decoders/viewers should display the truncated-to-8 image okay, and with the alpha sBIT
at 2
bits, only the two MSbs would be used, with the 6 LSb rgb bits being hidden.
As per the graphic below, the IHDR chunk would indicate 8 bit and color type 6, for fallback compatibility. So how to tell the decoder we're a segmented 10bit image? We use a tEXt
chunk with one string that says segmented10bit
to signal the format, and this maintains backwards compatibility.
1) A 10bit png format with all the advantages of png, but a bit depth to match many current video and HDR formats.
2) Should compress similarly to an 8bit RGBA png. First three bytes are not unlike an 8 bit image, then the one LSb/alpha byte, which might not compress as small as a typical alpha, but overall this scheme should be substantially more efficient than 10-in-16.
3) So long as the decoder supports sBIT
, this should be backwards compatible, with the caveat that images with truncated LSbs may have artifacts.
4) And finally, though tests need to be run, this seems like an efficient way to handle 10bit images as far as the compression and total data size is concerned.
Curious your thoughts?
Thank you for reading.
@ProgramMax What about moving this thread to https://github.com/w3c/PNG-spec , now that there is a dedicated WG for PNG maintenance?
PNG currently supports bit depths of 1, 2, 4, 8, and 16 per channel. This is specified in the IHDR chunk: https://www.w3.org/TR/PNG/#11IHDR
HDR commonly makes use of 10 bits per channel. Should we consider specifying a 10-bit/channel addition?
Thought dump:
My initial thoughts: