nigeltao / qoi2-bikeshed

"Quite OK Image" version 2 discussions
33 stars 0 forks source link

Simple OP_RGBA change #43

Open chocolate42 opened 2 months ago

chocolate42 commented 2 months ago

Instead of the current 5 byte fixed encoding every time there's an alpha change, do a 2 byte alpha change followed by OP_DIFF/OP_LUMA/OP_RGB to encode the rgb elements. This changes nothing for RGB images, can regress slightly some RGBA images if too many 6 byte encodings are used, but on average greatly improves RGBA space-efficiency. Works best when images have a lot of nuanced alpha gradients.

qoia-v2.zip

# Grand total for ../images-lance
          decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:        24.4       213.4        166.13         19.03      1374    8.8%
stbi:          30.0       254.8        135.41         15.94      2119   13.6%
qoi:           12.4         9.2        328.20        442.29      2109   13.6%
qoia:          12.3         9.8        328.98        413.06      1830   11.8%
# Grand total for ../images
          decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:         3.4        32.8        135.13         14.17       395   24.1%
stbi:           4.1        36.1        112.66         12.86       561   34.2%
qoi:            1.5         1.6        315.59        295.95       463   28.2%
qoia:           1.5         1.6        311.46        284.29       462   28.1%
oscardssmith commented 2 months ago

do the size gains persist with compression applied on top (e.g. lz4/ zstd)?

chocolate42 commented 2 months ago

Presumably it's beneficial overall. Qoi is beneficial as a preprocessor (qoi.lz4 tends to be smaller than raw.lz4), this change should include alpha changes in this benefit. Worth testing if you want, I will eventually but my goals lie elsewhere (I want to make a qoi-like that is simd-friendly and see how fast I can make it, this change is part of that).

BTW if your goal is smaller filesize after compression, you might want to disable the index op. It often gets in the way of the compressor resulting in a larger filesize.

chocolate42 commented 2 months ago

I got less lazy and did some tests. OP_RGBA remains beneficial with compression, and disabling the index op improves compression when the compression is not very light. Both together is even better.

images-lance, MB

568.16 qoi.qoi
487.41 qoi.1.lz4
404.86 qoi.6.lz4
403.40 qoi.12.lz4

652.34 qoi.noindex.qoi
506.92 qoi.noindex.1.lz4
389.93 qoi.noindex.6.lz4
386.07 qoi.noindex.12.lz4

493.11 qoia.qoi
449.20 qoia.1.lz4
383.04 qoia.6.lz4
381.89 qoia.12.lz4

539.64 qoia.noindex.qoi
452.05 qoia.noindex.1.lz4
362.89 qoia.noindex.6.lz4
359.62 qoia.noindex.12.lz4
chocolate42 commented 2 months ago

Fixed a bug which very rarely caused encode to miss end of file. Took testing with thousands of images to catch. Not going any further with qoia I just couldn't leave the bug present here.

qoia-v2.zip

chocolate42 commented 2 months ago

Taking this RGBA op and using it as part of a qoi-like (roi) does this:

# Grand total for ../qoipond/images
              decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:             3.4        31.9        137.01         14.55       395   24.1%
stbi:               4.1        35.2        114.37         13.17       561   34.2%
qoi:                1.5         1.6        313.21        292.18       463   28.2%
roi:                1.2         1.4        379.94        326.05       455   27.7%
roi.lz4:            1.2         1.6        373.39        284.14       415   25.3%
roi.zstd1:          1.4         2.0        321.42        237.49       358   21.8%
roi.zstd3:          1.5         2.8        314.58        167.68       350   21.4%
roi.zstd9:          1.5         5.3        310.43         87.80       344   21.0%
roi.zstd19:         1.6        56.3        283.37          8.24       335   20.4%
# Grand total for ../qoipond/images-lance
              decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:            24.0       208.6        168.94         19.47      1374    8.8%
stbi:              29.5       251.5        137.84         16.15      2119   13.6%
qoi:               12.3         9.4        331.03        430.51      2109   13.6%
roi:               10.4         8.6        392.23        470.30      1993   12.8%
roi.lz4:           10.7        10.8        377.89        376.54      1629   10.5%
roi.zstd1:         11.5        12.2        352.46        333.12      1217    7.8%
roi.zstd3:         11.9        16.6        342.63        245.28      1161    7.5%
roi.zstd9:         12.0        37.3        339.47        108.79      1095    7.0%
roi.zstd19:        12.3       520.6        329.45          7.80      1032    6.6%

With zstd compression level 1 roi.zstd beats png and qoi on filesize for both rgb and rgba whilst still being quicker to decode than qoi. This is before trying to speed up roi using simd. roi is fairly simple itself, containing the following ops:

LUMA232
LUMA464 (the qoi LUMA op, R and B 4 bits stored as diff from G, G stored in 6 bits)
LUMA777
OP_RGB
OP_RGBA as described in this thread, opcode+alpha+[LUMA232/LUMA464/LUMA777/OP_RGB]
RUN op with values 1..30

roi is more simd-able because

I experimented with a qoi-like (soi) where all the luma ops had the same number of bits for each plane (222, 555, 777), which allows for simpler simd, however the space efficiency lost relative to roi meant that further compression had more data to work through, to the point where strong compression would spend more extra time than could possibly be saved by soi however accelerated soi could be made with simd, with worse overall compression. So I think roi is an optimal choice, and I'll attempt to simd it in the next weeks.

tansy commented 1 month ago

You could fork, rename and modify it, so others would possibly know about it. Issue like this is going to be missed. Unless you don't want gh to know your code, which is understandable.