nigeltao / qoi2-bikeshed

"Quite OK Image" version 2 discussions
32 stars 0 forks source link

Lossy QOI Variant #21

Open nigeltao opened 2 years ago

nigeltao commented 2 years ago

We could use a bit in the header to indicate a trivial lossy QOI variant (call it QOILY and the original flavor QOILL), reducing from 8 to 4 bits per RGBA channel on encode, reconstitute on decode.

diff --git a/qoi.h b/qoi.h
index 0aec728..6d95a50 100644
--- a/qoi.h
+++ b/qoi.h
@@ -348,6 +348,8 @@ unsigned int qoi_read_32(const unsigned char *bytes, int *p) {
    return (a << 24) | (b << 16) | (c << 8) | d;
 }

+const int lossy = 4; // TODO: make lossy part of qoi_desc.channels.
+
 void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
    if (
        data == NULL || out_len == NULL || desc == NULL ||
@@ -380,7 +382,7 @@ void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {
    qoi_rgba_t index[64] = {0};

    int run = 0;
-   qoi_rgba_t px_prev = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 255}};
+   qoi_rgba_t px_prev = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 255>>lossy}};
    qoi_rgba_t px = px_prev;

    int px_len = desc->width * desc->height * desc->channels;
@@ -389,12 +391,15 @@ void *qoi_encode(const void *data, const qoi_desc *desc, int *out_len) {

    for (int px_pos = 0; px_pos < px_len; px_pos += channels) {
        if (channels == 4) {
-           px = *(qoi_rgba_t *)(pixels + px_pos);
+           px.rgba.r = pixels[px_pos+0] >> lossy;
+           px.rgba.g = pixels[px_pos+1] >> lossy;
+           px.rgba.b = pixels[px_pos+2] >> lossy;
+           px.rgba.a = pixels[px_pos+3] >> lossy;
        }
        else {
-           px.rgba.r = pixels[px_pos];
-           px.rgba.g = pixels[px_pos+1];
-           px.rgba.b = pixels[px_pos+2];
+           px.rgba.r = pixels[px_pos+0] >> lossy;
+           px.rgba.g = pixels[px_pos+1] >> lossy;
+           px.rgba.b = pixels[px_pos+2] >> lossy;
        }

        if (px.v == px_prev.v) {
@@ -577,6 +582,13 @@ void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels) {
        }
    }

+   if (lossy) {
+       for (int px_pos = 0; px_pos < px_len; px_pos++) {
+           unsigned char c = pixels[px_pos];
+           pixels[px_pos] = c | (c<<4);
+       }
+   }
+
    return pixels;
 }

File sizes are now ballpark comparable to (lossy) JPEG, with much of the benefits of lossless QOI (very simple implementation, very fast encode and decode).

The file size ratios vs JPEG looks better for images/screenshots (roughly 1x) than images/kodak (roughly 2x, but these are photos, possibly even JPEGs to start with).

QOI-Lossy vs JPEG File Sizes

 191138 kodak/kodim01.jpeg
 348490 kodak/kodim01.qoily

 140532 kodak/kodim02.jpeg
 242638 kodak/kodim02.qoily

 105215 kodak/kodim03.jpeg
 213045 kodak/kodim03.qoily

 137832 kodak/kodim04.jpeg
 285990 kodak/kodim04.qoily

 208435 kodak/kodim05.jpeg
 365994 kodak/kodim05.qoily

---

1735098 screenshots/amazon.com.jpeg
2031597 screenshots/amazon.com.qoily

 829789 screenshots/apple.com.jpeg
 805029 screenshots/apple.com.qoily

1215546 screenshots/cnn.com.jpeg
1025061 screenshots/cnn.com.qoily

 348846 screenshots/duckduckgo.com.jpeg
 180998 screenshots/duckduckgo.com.qoily

1254782 screenshots/en.wikipedia.org.jpeg
 851898 screenshots/en.wikipedia.org.qoily

The JPEGs were generated by ImageMagick's convert foo.png foo.jpeg.

QOI-Lossless (the Original QOI) vs PNG File Sizes

 736501 kodak/kodim01.png
 931318 kodak/kodim01.qoill

 617995 kodak/kodim02.png
 723898 kodak/kodim02.qoill

 502888 kodak/kodim03.png
 608468 kodak/kodim03.qoill

 637432 kodak/kodim04.png
 782172 kodak/kodim04.qoill

 785610 kodak/kodim05.png
 975022 kodak/kodim05.qoill

---

6381202 screenshots/amazon.com.png
5511312 screenshots/amazon.com.qoill

2360762 screenshots/apple.com.png
2208829 screenshots/apple.com.qoill

2748636 screenshots/cnn.com.png
2508600 screenshots/cnn.com.qoill

 289177 screenshots/duckduckgo.com.png
 338638 screenshots/duckduckgo.com.qoill

1316655 screenshots/en.wikipedia.org.png
1584675 screenshots/en.wikipedia.org.qoill

QOI-Lossless vs QOI-Lossy File Sizes

 931318 kodak/kodim01.qoill
 348490 kodak/kodim01.qoily

 723898 kodak/kodim02.qoill
 242638 kodak/kodim02.qoily

 608468 kodak/kodim03.qoill
 213045 kodak/kodim03.qoily

 782172 kodak/kodim04.qoill
 285990 kodak/kodim04.qoily

 975022 kodak/kodim05.qoill
 365994 kodak/kodim05.qoily

---

5511312 screenshots/amazon.com.qoill
2031597 screenshots/amazon.com.qoily

2208829 screenshots/apple.com.qoill
 805029 screenshots/apple.com.qoily

2508600 screenshots/cnn.com.qoill
1025061 screenshots/cnn.com.qoily

 338638 screenshots/duckduckgo.com.qoill
 180998 screenshots/duckduckgo.com.qoily

1584675 screenshots/en.wikipedia.org.qoill
 851898 screenshots/en.wikipedia.org.qoily
nigeltao commented 2 years ago

In any case, both lossy and lossless QOI might make a nice addition to VNC's compression options.

oscardssmith commented 2 years ago

This is a really bad lossy format as specified. 4 bit RGBA is going to look awful. I think that any good form of lossless compression will need fairly complicated perceptual modeling and won't be a good fit for QOI.

nigeltao commented 2 years ago

It might look awful (or it might not - I'd have to look at some 4-bit outputs), but it still might be good enough for low-bandwidth low-latency VNC. Faster than JPEG or ZLIB compression. Smaller network traffic than lossless.

I think that any good form of lossless compression will need fairly complicated perceptual modeling and won't be a good fit for QOI.

I assume you meant lossy instead of lossless.

nigeltao commented 2 years ago

kodak/kodim01.png downsampled to 4 bits per channel.

kodim01 4444 kodim01

nigeltao commented 2 years ago

screenshots/amazon.com.png downsampled to 4 bits per channel. Top 800 rows only.

amazon com h800 4444 amazon com h800

oscardssmith commented 2 years ago

For screenshots, it's surprisingly OK. The main problem will be things like gradients and faces.

nigeltao commented 2 years ago

Does it look awful? Well, there's obvious banding if you know what to look for, but it depends on the context.

I'm not saying we can use QOI-Lossy everywhere we'd use JPEG. I'm saying there might be situations (e.g. VNC) where QOI-Lossy might be useful.

By the way, quick-and-dirty Go program to downsample to 4444: https://go.dev/play/p/wesTsnYyUah

nigeltao commented 2 years ago

kodak/kodim04.png downsampled to 4 bits per channel.

kodim04 4444 kodim04

oscardssmith commented 2 years ago

Yeah, that's a pretty clear example. The equivalent JPEG will be around the same size with no noticeable quality loss.

nigeltao commented 2 years ago

I'll also note that these images were produced by an encoder (treating each pixel independently), but if you want nicer quality (but still smaller than lossless file size), a different encoder could try standard dithering algorithms to lessen the obvious banding.

nigeltao commented 2 years ago

The equivalent JPEG will be around the same size with no noticeable quality loss.

Well, yes, but with slower encodes and decodes. Possibly not with hardware-acceleration. Anyway, I'll repeat my previous point (that might have got lost in the rapid replies):

I'm not saying we can use QOI-Lossy everywhere we'd use JPEG. I'm saying there might be situations (e.g. VNC) where QOI-Lossy might be useful.

And, like QOI-Lossless, it's only 300-ish lines of code. Or 10s of lines if you already have QOI-Lossless.

oscardssmith commented 2 years ago

One approach that might be a lot better would be to store exact pixels diffs if the pixels are close, and only shift out data if the 8 bit opcodes aren't successful.

Wulf0x67E7 commented 2 years ago

A different really simple approach would be to mul/shift the diffs to increase their range by sacrificing precision, with the range of QOI_DIFF_8 f.e. moving from [-2, -1, 0, 1] to something like [-8, -4, 0, 4]. This should allow for both substantial space savings (QOI_DIFF_16s 5/4-bit ranges would go from [-16..15]/[-8..7] to [-64..60]/[-32..28]) as well as running-delta-dithering by the encoder for still quite respectable quality. Another improvement would be to make all diffs odd (f.e [-7, -3, 1, 5]) to further increase the effective dithering of the lowest bit.

You could also change the index to only hash and test the higher 6/5/4-bits when looking for a match. Something like:

index: 0xd4 testing: 0xd8 => matching higher nibble 0xdX, encode as cache hit

index: 0xd8 => replaced after hit for more dithering EDIT: wait, wouldn't work on decode, oh well, still 0xd4 then. testing: 0x6a => miss, fallback to diffs

And ignore small diffs (<8?) while encoding runs, so [12,13,12,15] would turn into a run of 4 (with a value of 12 or 13, depending on the intelligence of the encoder).

nigeltao commented 2 years ago

Downsampling from 8 to 6 bits per channel (instead of to 4) isn't too bad in terms of visual quality. Again, there's banding if you know where to look, but I think that it's much less obvious...

amazon com h800 6666 nodither

kodim04 6666 nodither

Here's the file sizes in bytes and as a fraction of lossless (8 bits per channel), starting with the demo10 code. Going down to 6 bits per channel still gives you a noticable file size reduction. There might be further file size gains from tweaking the ((C.rgba.r ^ C.rgba.g ^ C.rgba.b ^ C.rgba.a) & 127) hash function, which is pretty poor when each channel only has a few bits.

Lossy qoi-demo10    amazon.com.h800.png        kodim04.png
1 bit  per channel:       97030  0.1075      69542  0.0881
2 bits per channel:      171272  0.1897     125218  0.1586
3 bits per channel:      242119  0.2681     215190  0.2726
4 bits per channel:      330376  0.3659     286035  0.3624
5 bits per channel:      452027  0.5006     377756  0.4786
6 bits per channel:      585818  0.6488     485884  0.6155
7 bits per channel:      735779  0.8148     627817  0.7954
8 bits per channel:      902986  1.0000     789351  1.0000

Perhaps we could use 3 bits of the header for a lossiness knob going from 0 (8 bits per channel) to 7 (1 bit per channel). Again, it's not something you'd want to use all of the time, or even most of the time, but it might be a useful thing to have in the toolbox, especially if it's only 10 or 20 extra lines of code on top of lossless QOI.

wilberton commented 2 years ago

I think for game textures that could be really useful (especially if the encoder was clever enough to add some dithering). One addition I might add would be to specify the precision for the alpha channel separately from rgb. Often for game textures 4 bits is plenty for alpha (it's all you get in dxt5 for example), but you may want more for rgb. I can see 2 bits for alpha being usable for a lot of textures too, but you'd nearly always want >4 for rgb.

jmaselbas commented 2 years ago

What about having a lossy encoder, and keeping a "lossless" decoder ? by ignoring small pixel differences to increase the run length?

rayrobdod commented 2 years ago

Creating a lossy encoder for a lossless format is definitely possible. pngquant exists for png.

dumblob commented 2 years ago

I definitely raise my hand for making encoders support "smart quantization with dithering and high-fidelity transparency" (and possibly other techniques) as pngquant does.

It allows for user-settable level of quality and generally solves the most visually distracting artifacts while drastically reducing the size (not as jpg, but much closer than any other attempts I saw above). This could be QOI 1.1 (just adding to the current spec the mandate to support this in encoders).

And the best of all - decoders don't need to change a thing!

kodonnell commented 2 years ago

FYI it was suggested that https://github.com/phoboslab/qoi/issues/145 might be more appropriate here. Wouldn't mind someone checking the results, as they were pretty compelling, especially as it's "free".

dumblob commented 2 years ago

@kodonnell thanks! I'm really interested in lossy compression, so your findings ignited my further interest in the downscaling idea.

I'm though a non-believer in SSIM and thus would want to see maybe 5 complex different images with at least 3 millions of pixels compressed with JPEG@93 (very frequently used ratio for camera-taken pictures of physical world) and with different sampling methods in your package - all three (original, jpeg@93, qoi-lossy) side by side.

Or do you already have such images?

kodonnell commented 2 years ago

I'm afraid I don't have any such images - can you send some through? I was going to use the python image similarity package which supports other metrics, but didn't want the bloat it requires. What metric would you prefer?

dumblob commented 2 years ago

@kodonnell sorry for the late reply. I would imagine something like ISSIM-S (further details about SSIM huge limitations: https://www.researchgate.net/publication/283153178_Limitations_of_the_SSIM_quality_metric_in_the_context_of_diagnostic_imaging and https://ieeexplore.ieee.org/document/9337182 ).

As for pictures anything from your smartphone could do :-) But let us try https://pngquant.org/vsphotoshop.html .