nigeltao / qoi2-bikeshed

"Quite OK Image" version 2 discussions
33 stars 0 forks source link

Hash function #19

Open nigeltao opened 2 years ago

nigeltao commented 2 years ago

Currently, it's:

#define QOI_COLOR_HASH(C) (C.rgba.r ^ C.rgba.g ^ C.rgba.b ^ C.rgba.a)
// followed by an "&63" which drops the high 2 bits of each channel.

This is about as simple as it gets, which has its own virtues. There may be better ones, although be aware that better compression (size) might mean worse performance (throughput).

Some discussion (amongst other things) is at https://github.com/phoboslab/qoi/issues/48

nigeltao commented 2 years ago

For what it's worth, this trivial change to the hash function:

diff --git a/qoi.h b/qoi.h
index 0aec728..da278a8 100644
--- a/qoi.h
+++ b/qoi.h
@@ -321,7 +321,7 @@ void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels);
 #define QOI_MASK_3  0xe0 // 11100000
 #define QOI_MASK_4  0xf0 // 11110000

-#define QOI_COLOR_HASH(C) (C.rgba.r ^ C.rgba.g ^ C.rgba.b ^ C.rgba.a)
+#define QOI_COLOR_HASH(C) (C.rgba.r ^ C.rgba.g ^ C.rgba.b)
 #define QOI_MAGIC \
        (((unsigned int)'q') << 24 | ((unsigned int)'o') << 16 | \
         ((unsigned int)'i') <<  8 | ((unsigned int)'f'))

Has a noticable improvement in encode throughput on my laptop, for images/misc (1.18x) and images/screenshots (1.21x). The images/misc set (the only one with not-fully-opaque images) also shows slightly worse compression size (1.01x).

        decode ms   encode ms   decode mpps   encode mpps   size kb
images/kodak
qoi-before:   6.4         8.3         61.02         47.11       771
qoi-after:    6.4         8.7         61.12         45.10       771
images/misc
qoi-before:   5.8         6.1        154.48        146.05       400
qoi-after:    5.7         5.2        155.45        172.22       404
images/screenshots
qoi-before:  53.9        48.6        152.63        169.25      2582
qoi-after:   54.4        40.3        151.31        204.22      2582
images/textures
qoi-before:   1.6         2.0         80.17         65.13       184
qoi-after:    1.6         2.0         79.42         65.67       184
images/wallpaper
qoi-before: 137.6       153.6         68.10         61.02     10640
qoi-after:  139.1       151.8         67.38         61.75     10640
nigeltao commented 2 years ago

Here's another experiment, called "qoi-fifo2e".

Instead of tweaking the hash function, we remove the hash function and just use a FIFO cache: keep the 64 most recently written (not necessarily most recently used/read) colors. Update the cache on DIFF, LUMA, RGB and RGBA ops. Leave the cache unchanged on INDEX and RUN ops.

Here's the diff (the code is a little awkward because I was trying to minimize the diff):

--- qoi-master-2ee2169.h    2021-12-12 08:44:59.203920322 +1100
+++ qoi-fifo2e.h    2021-12-15 13:29:18.194335467 +1100
@@ -387,6 +387,7 @@
    const unsigned char *pixels = (const unsigned char *)data;

    qoi_rgba_t index[64] = {0};
+   unsigned int index_wpos = 0;

    int run = 0;
    qoi_rgba_t px_prev = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 0}};
@@ -419,13 +420,15 @@
                run = 0;
            }

-           int index_pos = QOI_COLOR_HASH(px) % 64;
-
+           for (int index_pos = 0;; index_pos++) {
+           if (index_pos < 64) {
            if (index[index_pos].v == px.v) {
                bytes[p++] = QOI_OP_INDEX | index_pos;
+               break;
+           }
            }
            else {
-               index[index_pos] = px;
+               index[index_wpos++ & 63] = px;

                if (px.rgba.a == px_prev.rgba.a) {
                    char vr = px.rgba.r - px_prev.rgba.r;
@@ -464,7 +467,10 @@
                    bytes[p++] = px.rgba.b;
                    bytes[p++] = px.rgba.a;
                }
+               break;
            }
+           }  // end-for (int index_pos = 0;; index_pos++)
+
        }
        px_prev = px;
    }
@@ -516,6 +522,7 @@

    qoi_rgba_t px = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 0}};
    qoi_rgba_t index[64] = {0};
+   unsigned int index_wpos = 0;

    int run = 0;
    int chunks_len = size - QOI_PADDING;
@@ -530,12 +537,14 @@
                px.rgba.r = bytes[p++];
                px.rgba.g = bytes[p++];
                px.rgba.b = bytes[p++];
+               index[index_wpos++ & 63] = px;
            }
            else if (b1 == QOI_OP_RGBA) {
                px.rgba.r = bytes[p++];
                px.rgba.g = bytes[p++];
                px.rgba.b = bytes[p++];
                px.rgba.a = bytes[p++];
+               index[index_wpos++ & 63] = px;
            }
            else if ((b1 & QOI_MASK_2) == QOI_OP_INDEX) {
                px = index[b1];
@@ -544,6 +553,7 @@
                px.rgba.r += ((b1 >> 4) & 0x03) - 2;
                px.rgba.g += ((b1 >> 2) & 0x03) - 2;
                px.rgba.b += ( b1       & 0x03) - 2;
+               index[index_wpos++ & 63] = px;
            }
            else if ((b1 & QOI_MASK_2) == QOI_OP_LUMA) {
                int b2 = bytes[p++];
@@ -551,12 +561,11 @@
                px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f);
                px.rgba.g += vg;
                px.rgba.b += vg - 8 +  (b2       & 0x0f);
+               index[index_wpos++ & 63] = px;
            }
            else if ((b1 & QOI_MASK_2) == QOI_OP_RUN) {
                run = (b1 & 0x3f);
            }
-
-           index[QOI_COLOR_HASH(px) % 64] = px;
        }

        if (channels == 4) { 

Here's the numbers. Encode speed gets 2.4x worse because we now have to compare against all 64 cache entries (instead of just 1) on each non-run pixel. Decode speed gets 1.2x better because index_wpos++ is faster than calculating the hash function, because we no longer update the index for INDEX and RUN ops (which was a no-op conceptually but still took CPU time, before the change) and maybe because the QOI files get a tiny bit smaller.

master is commit 2ee2169 (2021-12-11).

        decode ms   encode ms   decode mpps   encode mpps   size kb    rate

## Total for images/icon_512
libpng:       5.0        33.9         52.74          7.74        51    5.0%
qoi-master:   1.5         1.3        171.27        198.88        85    8.4%
qoi-fifo2e:   1.3         2.7        199.95         98.18        87    8.6%

## Total for images/icon_64
libpng:       0.2         0.9         23.26          4.35         3   23.6%
qoi-master:   0.0         0.0        104.20        100.77         4   28.7%
qoi-fifo2e:   0.0         0.1        115.03         38.60         4   28.2%

## Total for images/photo_kodak
libpng:      15.2       295.7         25.88          1.33       717   46.7%
qoi-master:   6.1         6.7         64.95         58.81       671   43.7%
qoi-fifo2e:   4.9        17.5         79.58         22.45       668   43.5%

## Total for images/photo_tecnick
libpng:      44.9      1143.8         32.08          1.26      2414   42.9%
qoi-master:  22.6        26.3         63.73         54.84      2527   44.9%
qoi-fifo2e:  17.9        66.6         80.23         21.61      2523   44.9%

## Total for images/photo_wikipedia
libpng:      42.9       865.8         25.28          1.25      2046   48.3%
qoi-master:  17.0        20.2         63.66         53.81      2102   49.6%
qoi-fifo2e:  14.5        54.3         74.86         19.99      2099   49.6%

## Total for images/pngimg
libpng:      56.4       533.3         32.08          3.39      1201   17.0%
qoi-master:  16.4        16.7        110.41        108.00      1436   20.3%
qoi-fifo2e:  14.3        38.8        126.59         46.68      1431   20.3%

## Total for images/screenshot_game
libpng:      18.6       216.7         34.08          2.92       448   18.1%
qoi-master:   6.7         6.5         93.95         97.27       519   21.0%
qoi-fifo2e:   5.7        15.6        110.87         40.59       511   20.7%

## Total for images/screenshot_web
libpng:      92.0      1060.7         88.33          7.66      2402    7.6%
qoi-master:  55.7        44.8        145.98        181.17      2649    8.3%
qoi-fifo2e:  48.4        88.0        167.98         92.34      2622    8.3%

## Total for images/textures_photo
libpng:      39.6       725.0         26.45          1.45      1977   48.3%
qoi-master:  15.0        16.0         69.87         65.44      1981   48.4%
qoi-fifo2e:  12.1        47.5         86.65         22.07      1984   48.4%

## Total for images/textures_pk
libpng:       0.9        24.6         47.24          1.81        89   51.5%
qoi-master:   0.8         0.8         59.21         57.16        75   43.5%
qoi-fifo2e:   0.6         1.8         78.65         24.52        72   41.6%

## Total for images/textures_pk01
libpng:       5.3        66.5         24.65          1.95       163   32.3%
qoi-master:   1.6         1.7         80.68         75.78       178   35.2%
qoi-fifo2e:   1.4         5.0         90.33         26.17       178   35.1%

## Total for images/textures_pk02
libpng:      11.7       205.0         25.97          1.48       427   36.1%
qoi-master:   4.5         4.5         67.71         67.38       479   40.4%
qoi-fifo2e:   3.7        12.8         82.66         23.80       482   40.7%

## Total for images/textures_plants
libpng:      31.9       340.6         33.35          3.12       857   20.6%
qoi-master:   8.6         9.9        123.62        107.79       922   22.2%
qoi-fifo2e:   7.3        22.9        144.92         46.53       920   22.1%

# Grand total for images
libpng:      13.5       187.9         34.47          2.47       423   23.3%
qoi-master:   5.1         5.2         91.89         89.38       463   25.6%
qoi-fifo2e:   4.2        12.7        109.58         36.57       460   25.4%

        decode ms   encode ms   decode mpps   encode mpps   size kb    rate

qoi-fifo2e.h.txt

chocolate42 commented 2 years ago

Refactored a little and tried making an AVX2 path with intrinsics. @notnullnotvoid this may not be optimal as I have zero experience, don't let this existing deter you from rolling your own and/or benchmarking. This exists to learn and to apply it to a variant, I'm not benchmarking further.

Compile with -mavx2 for avx2. Uses a gcc builtin to count trailing zeroes, msvc and maybe others will need their own definitions.

4700u gcc 11.2.0 -O3
# Grand total for qoi_benchmark_suite/images
                  decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:               6.596      77.717         70.38          5.97       398   24.2%
stbi:                 6.542      59.617         70.95          7.79       561   34.2%
qoi-masterae07:       2.123       2.574        218.61        180.31       463   25.6%
qoi-fifo2e-refac:     1.811       6.110        256.27         75.97       460   25.4%
qoi-fifo2e-refac-avx2:1.844       2.975        251.77        156.02       460   25.4%

qoi-fifo2e-avx2.h.txt

oscardssmith commented 2 years ago

Can you add a row for qoi-master to this benchmark?

chocolate42 commented 2 years ago

Done. Didn't redo the other results like you're supposed to but the environmental conditions are the same as half an hour ago ;)