Another possible opcode, QOI_OP_INDEX2. This is a two byte encoding with eight bit opcode, that acts as an index into a secondary index cache with 256 values.

Pro:

Bigger cache matches more input, allows some pixels to be stored as two bytes instead of 4 or 5 bytes
Only takes up a single opcode
Reuses the color hash calculation, getting more mileage from it

Con:

Requires an extra 1024 bytes of memory for the secondary cache. This seems doable even for microcontrollers/etc but expert eyes may have a different opinion

Comparison to current master build:

4700u, gcc 11.2.0, -O3

index2 takes master91cc, adds the QOI_OP_INDEX2 opcode by making QOI_OP_RUN store 62 values, and rearranges the decode chain slightly

# Grand total for qoi_benchmark_suite/images
       decode ms   encode ms   decode mpps   encode mpps   size kb    rate
libpng:        6.582      90.359         70.52          5.14       423   23.3%
stbi:          6.492      61.252         71.50          7.58       601   33.2%
qoi-master91cc:2.124       2.584        218.50        179.60       463   25.6%
qoi-index2:    2.410       2.510        192.60        184.90       456   25.2%

@phoboslab If this or any other opcode change is deemed suitable I'll do proper pull requests as and when

qoi-index2.h.txt

Testing some combinations of primary and secondary index cache sizes:

Baseline is roughly the current master branch (with a max run reduced from 62 to 60 to accomodate secondary index tags). Represented below with qoi-ind64
indX, where X is the size of the cache for the 1-byte encoding
.2, secondary index with 256 values (2-byte encoding, 8 bit tag)
.2a, secondary index with 512 values (2-byte encoding, 7 bit tag)

Grand total for ./qoi/qoi_benchmark_suite/images

         decode  encode  decode   encode  size    rate spare_tags
             ms      ms    mpps     mpps    kb

qoi-ind0: 1.719 2.237 269.99 207.50 509 28.1% 66 qoi-ind2: 2.063 2.423 225.02 191.60 504 27.8% 64 qoi-ind4: 2.103 2.484 220.75 186.89 498 27.5% 62 qoi-ind8: 2.107 2.580 220.25 179.89 492 27.2% 58 qoi-ind16: 2.044 2.509 227.14 185.02 484 26.7% 50 qoi-ind32: 2.165 2.494 214.35 186.12 474 26.2% 34 qoi-ind64: 2.181 2.488 212.83 186.58 463 25.6% 2

qoi-ind0.2: 2.113 2.314 219.71 200.61 486 26.8% 65 qoi-ind2.2: 2.328 2.620 199.40 177.16 484 26.7% 63 qoi-ind4.2: 2.342 2.643 198.21 175.60 480 26.5% 61 qoi-ind8.2: 2.434 2.542 190.70 182.59 475 26.2% 57 qoi-ind16.2: 2.367 2.624 196.12 176.88 469 25.9% 49 qoi-ind32.2: 2.431 2.550 190.94 182.02 463 25.5% 33 qoi-ind64.2: 2.421 2.524 191.70 183.93 456 25.2% 1

qoi-ind0.2a: 2.144 2.287 216.49 203.00 483 26.7% 64 qoi-ind2.2a: 2.426 2.561 191.35 181.23 481 26.5% 62 qoi-ind4.2a: 2.440 2.573 190.20 180.43 477 26.3% 60 qoi-ind8.2a: 2.430 2.586 191.01 179.51 472 26.1% 56 qoi-ind16.2a: 2.406 2.546 192.95 182.35 466 25.7% 48 qoi-ind32.2a: 2.510 2.610 184.91 177.87 460 25.4% 32 qoi-ind64.2a: 2.505 2.591 185.26 179.15 453 25.0% 0



I like the look of ind32.2, compresses on par with the baseline ind64 but uses 31 less tags. There's also something to be said about ind16.2 if those extra 16 ops can be better used elsewhere, maybe in a variant that limits itself to MASK_2 and MASK_4.

nigeltao / qoi2-bikeshed

Secondary index cache with two-byte encoding #26

Grand total for ./qoi/qoi_benchmark_suite/images