nigeltao / qoi2-bikeshed

"Quite OK Image" version 2 discussions
33 stars 0 forks source link

Pixel format wishlist #30

Open chocolate42 opened 2 years ago

chocolate42 commented 2 years ago

It would be useful if the pixel formats accepted could expand beyond 8 bit RGB/RGBA 4:4:4, but people have different opinions about where to expand and how to do it.

This guy mentions 10/12/14/16 4:4:4 RGB as useful, which a single 16 bit implementation may be able to usefully cover without undue complication? https://github.com/phoboslab/qoi/issues/95#issuecomment-998173495

I like the idea of supporting 8/10 bit YCbCr 4:4:4/4:2:2/4:2:0, that way the bitstream could be used as the basis for a video codec without heavy modifications. As a streaming pixel engine we don't necessarily have the luxury of being able to jump back to handle subsampling properly. Input could be processed two rows at a time, allowing 4x2 blocks to be used, or enforce the IO to be re-arranged into a 4x2 block streaming form (ie YYYYYYYYCbCbCrCr or similar for 4:2:0). It looks like YUV format allows for many memory layouts which sounds like a complication, but takes the ordering optimisation out of our hands. It means we can get away with enforcing 4x2 block streaming IO, ie any other YUV form is handled by an intermediary handler piping between file and encoder/decoder.

How to actually compress YCbCr/subsampled is an open question. Single pixel diff encoding doesn't seem to do much, and if we do read in blocks it seems wasteful not utilise that.

Chainfire commented 2 years ago

I've been working specifically on an 8-bit YCbCr 4:2:0 variant since a couple of days ago. Will post about it as soon as I finish the documentation, the code itself is done and tested.

Pretty content with the results: on average it gets the same or better performance as final qoi (including RGB<->YCbCr420 conversion) but produces files half the size, which (at least to me) are visually indistinguishable from the originals.

A large part of qoi's algo was changed, but you can still see the heritage.

chocolate42 commented 2 years ago

If you start with a YCbCr 4:2:0 source is your variant lossless?

nigeltao commented 2 years ago

YCoCg (#5) is an alternative to YCbCr. If you want lossless, though, you'll need more than 8 bits somehow.

Chainfire commented 2 years ago

If you start with a YCbCr 4:2:0 source is your variant lossless?

Yes, the only lossy part is the color-space conversion.

YCoCg (#5) is an alternative to YCbCr. If you want lossless, though, you'll need more than 8 bits somehow.

Not sure I'm going to do anything with YCoCg but thanks for the hint!

nigeltao commented 2 years ago

If you abstract away the RGBA semantics and just think of QOI as a way to compress a sequence of 4-byte 'pixels'... you could process an (opaque) input image in 2x1 blocks (macropixels), each 4-byte block being YUYV (where UV means CbCr or CoCg). The 4:2:2 Chroma sub-sampling is lossy if you start with a regular RGBA image, obviously, but lossless in a sense, if you're already starting with 4:2:2 or 4:2:0.

You'd surely want to tweak the QOI opcodes after that. For example:

nigeltao commented 2 years ago

As for 10/12/14/16 4:4:4 RGB, it's easy to say that those pixel formats are useful so it'd be nice for QOI to support them. But it's not obvious how QOI would do so with any reasonable compression. INDEX and RUN ops only kick in if the same RGBA values occur multiple times, but the whole point of HDR is to capture subtle differences between otherwise pretty similar RGBA values. DIFF and LUMA ops only kick in if there are small (2-bit, 4-bit) differences in RGB values, but it's not obvious that that works well if you'll need to capture 10-bit or 12-bit differences.

There is an idea in the #4 thread for interleaving even and odd planes and throwing vanilla QOI at each plane. But AFAIK that's just an idea and nobody's done the experiment to see if actually does anything useful (e.g. is it better than just gzipping the raw 16-bit RGBA stream) or if it ends up trying to QOI-compress what's effectively noise.

chocolate42 commented 2 years ago

But it's not obvious how QOI would do so with any reasonable compression.

One way that might work pretty well is to treat the upper 8 bits as you would an 8 bit source, which should compress as well as we are now, with the lower bits stored verbatim. The verbatim lower bits are only stored byte-aligned at 16 bit RGB or 12/16 bit RGBA, unless the verbatim storage is done pairwise/quadwise.

Alternatively and possibly ideally we could extend LUMA/DIFF/whatever ops to handle 16 bits properly, but that would involve more opcode space than we can feasibly spare in a fixed-opcode format. I'm working on a format that lets you define the opcodes used in the header (allowing the bitstream to be tailored to the input), it's only targeting 8 bit for now but one day it could quite easily be adapted to higher bit depths.

oscardssmith commented 2 years ago

for 10 and higher bitrate images, I have a few ideas of potentially good opcodes. we can save a bunch of opcode space by removing qoi index and run length. the main opcodes to add are ones that are diffs based on linear interpolation of the previous 2 pixels.

Chainfire commented 2 years ago

I put my YCbCr420 variant here - https://github.com/Chainfire/qoy

nigeltao commented 2 years ago

Huh, no index ops, only diffs and runs. Interesting.

Did you try some sort of indexing at all (and it didn't seem to move the benchmark numbers, speed or size; the cache hit rate is just too small)? Or was it just complicated to try and hash and index a 6 (or 10) byte macroblock?

Chainfire commented 2 years ago

Every OP in there had significant effect on the test set - I tried several variations of all of them.

I did not get noteworthy results trying to index. Tried several different hash functions and different index sizes, didn't find one that worked well (or at all). Though of course it's always possible I did something dumb/wrong that messed it up, it's the holidays with distractions aplenty, after all. Maybe the odds of a hit on a six-way hash are just too small, requiring an index of enough bits that other ops are just as efficient, or maybe the colorspace conversion combined with the subsampling makes a hit less likely.

Or maybe I just wasn't focusing on the right test subset when doing it, I've done a lot of benching but I'm sure I didn't bench the entire testset for everything I tried.

oscardssmith commented 2 years ago

the one index type worth trying would be a 2 byte index + diff.

dumblob commented 2 years ago

@Chainfire looked at your results and I'm impressed - what are your plans with that attempt? Do you think combining it with user-adjustable "smart" quantization would make it fast as QOI but with this significantly reduced size (even slightly smaller than jpeg) while retaining very good visual quality (compared to similarly sized jpeg file)?

wbd73 commented 2 years ago

@Chainfire Did you consider using the YIQ color space instead of YCbCr ? https://en.wikipedia.org/wiki/YIQ

Chainfire commented 2 years ago

@Chainfire looked at your results and I'm impressed - what are your plans with that attempt? Do you think combining it with user-adjustable "smart" quantization would make it fast as QOI but with this significantly reduced size (even slightly smaller than jpeg) while retaining very good visual quality (compared to similarly sized jpeg file)?

I don't have any specific plans. I just wanted to try it out. I played with using it as a base for fast streaming (rather than comparing with previous pixels, use the previous frame's pixel at the same position) but the bitrate wasn't impressive, at least not on my first (and only) attempt.

Smart quantization may be helpful but it's difficult (and time consuming) to do right without losing exactly the details you wanted to keep. Not something I will personally pursue at this point. Using pngquant's lib directly isn't useful either, as it essentially compresses each pixel to a single byte, which doesn't fit the algorithm QOY uses. You'd have to adjust the encoding and it would become as different from QOY as that is from QOI - which isn't a bad idea per se, but has nothing to do with QOY.

I mean, you could always just use JPEG if that sort of lossiness is what you're after... Keep in mind, pretty good quality JPEG, on images where QOY works particularly well on, can be as much as 90% smaller. That's not that easy to beat.

Chainfire commented 2 years ago

@Chainfire Did you consider using the YIQ color space instead of YCbCr ? https://en.wikipedia.org/wiki/YIQ

I did not. And looking at it, it's not immediately obvious to me why you would want to use particularly that colorspace?

wbd73 commented 2 years ago

The YIQ system is intended to take advantage of human color-response characteristics. The eye is more sensitive to changes in the orange-blue (I) range than in the purple-green range (Q)—therefore less bandwidth is required for Q than for I.

I thought maybe the fact that less bandwidth is required for Q than for I might be useful.

dumblob commented 2 years ago

I mean, you could always just use JPEG if that sort of lossiness is what you're after... Keep in mind, pretty good quality JPEG, on images where QOY works particularly well on, can be as much as 90% smaller. That's not that easy to beat.

My train of thought was that the historical separation of image formats into lossy and lossless ones has made the industry needlessly fragmented and complicates everything (because the image formats are mutually incompatible, differently supported by different platforms, related image libs have different release cycles and background etc.). Having a format which is simple, effective (both in losslessness and lossy variants) and even fast (for decode and even encode) would be very valuable on many places.

oscardssmith commented 2 years ago

The problem is that the things that make lossless compression effective are very different from those that make lossy compression effective. Good lossy compression requires really good models of human perception so you can throw away the bits that matter least. That's an inherently complicated process that naturally leads to things like Fourier transforms and perceptual models that don't make any sense in a lossless format.

dumblob commented 2 years ago

I'd argue that's not necessarily the case. I'd actually call this route through perception models a naive straightforward approach.

You can think of what I have in mind as an "incidental" approximation (i.e. not naive nor straightforward in the sense above) of the perception model. E.g. just conversion to a more lossless-compression-friendly color space allowed for a much higher compression ratio (of course, the conversion itself is lossy, but apparently perceptually not much in case of YCbCr - and the artifacts are much less disturbing than quantization, quantization+dithering, JPEG/fourier-compression, etc.).

And this "incidental" approximation of perception is what I'm after.