Closed phoboslab closed 2 years ago
The only nitpick I can think of would be the GRB- instead of RGB-bit-order of QOI_OP_LUMA. Is there any technical reason to use the green channel as the guide instead of the red one?
The idea was that changes in lightness (luma) would be most represented in the green channel, since that is the color that the human eye is most sensitive to. I just tried it with a 6-bit red channel instead and got consistently worse results — but not by much. Overall compression rate would be at 25.9%.
Maybe it's worth changing it anyway, for clarity's sake.
Huh, I didn't think that it would have any impact at all on a lossless format and that those human-perception-specific tricks only really matter when compressing lossily.
I'd personally change it, but it really is incredibly nitpicky and doesn't matter too much either way. ¯\_(ツ)_/¯
I think the reason it matters is probably that it is fairly common for images to get subtly lighter or darker while maintaining the same hue, and in those cases, green on average is what changes the most.
Wouldn't removing DIFF_24 greatly degrade the compression ratio of semi-transparent textures? (like hair textures, or maybe water textures) I think these changes make qoi useless for semi-transparent images. Maybe in that case its better to stop encoding alpha levels at all and only write rgb + 0/1 alpha?
Yes, it performs worse for these, but I am not convinced that this is an issue.
From my game dev experience even textures for water, clouds, dust etc. use fairly distinct alpha values. Many of these don't even have an alpha channel at all and are just blended with GL_ONE GL_ONE
(fire, flares, etc.) or GL_ZERO GL_SRC_COLOR
(water, smoke, etc.).
The two worst performing examples I found are actually the ones most prominently featured by other image formats: dice.png and fish.png. As I said before, I believe these are fairly "artificial" examples.
## images/misc/fish.png size: 1969x1307
decode ms encode ms decode mpps encode mpps size kb rate
libpng: 28.1 207.8 91.44 12.39 520 5.2%
stbi: 18.5 197.8 139.20 13.01 804 8.0%
qoi_master: 5.9 9.2 438.07 280.36 862 8.6%
qoi_experi: 5.6 6.9 457.53 372.52 1148 11.4%
## images/misc/dice.png size: 800x600
decode ms encode ms decode mpps encode mpps size kb rate
libpng: 6.7 51.6 71.92 9.30 237 12.7%
stbi: 5.2 45.8 92.34 10.48 341 18.2%
qoi_master: 1.3 1.8 362.65 263.83 355 19.0%
qoi_experi: 1.2 1.3 414.23 356.69 508 27.1%
Compression rate wasn't really great before (compared to libpng/stbi), but it's still very good compared to uncompressed RGBA.
One last change that might be worth trying is changing QOI_INDEX to actually use a LRU cache. This will probably be a little slower, but it would remove the effect of hash collisions on the file size.
Sorry @phoboslab I think there is a mistake in computiing the color differences:
char vr = px.rgba.r - px_prev.rgba.r;
char vg = px.rgba.g - px_prev.rgba.g;
char vb = px.rgba.b - px_prev.rgba.b;
char vg_r = vr - vg;
char vg_b = vb - vg;
You need int
to hold [-255; 255] differences in vxxx
, else someting funky will happen when differences are beyond +-127. I suppose the computation itself is fine by integer promotion ? Note that the master
version indeed holds the differences in int
s.
Edit : Got it the wrap around arithmetic, although the signed char troubled me a loong time: I finally thought of it in terms of
type Wrap_Around_Difference_Type is mod 256;
in Ada, then everything was clear.
Here's a tweak to the index, tested against the old data set because ISP's suck. qoi-exluma: Experimental branch baseline qoi-luma61: The index table is reduced to 61 values (prime) and the hash function is simply px.v%61. No other changes. Possible union endian issues that can be resolved with a compiler check (?) qoi-luma61.run64: The RGB and RGBA tags have been moved to the end of the index chunk so rle has a full 64 values, which seems to help decode speed a surprising amount (something something power of two?). There are still two unused 8 bit codes, but with a 6 bit RUN_8 there's no need for RUN_16 and I didn't want to theorycraft further here.
## Benchmarking ../qoitests/images/tango512/*.png -- 1 runs
## Totals (AVG) size: 0x0
decode ms encode ms decode mpps encode mpps size kb
libpng: 3.1 20.5 84.87 12.76 51
stbi: 2.1 20.6 124.21 12.74 69
qoi-exluma: 0.7 0.8 353.49 336.18 102
qoi-luma61: 0.7 0.8 371.70 319.96 86
qoi-luma61.run64: 0.6 0.8 457.27 331.28 86
## Benchmarking ../qoitests/images/kodak/*.png -- 1 runs
## Totals (AVG) size: 0x0
decode ms encode ms decode mpps encode mpps size kb
libpng: 9.6 179.6 41.10 2.19 717
stbi: 10.1 98.9 39.05 3.98 979
qoi-exluma: 3.1 4.2 127.20 94.29 671
qoi-luma61: 2.9 4.5 135.15 88.22 675
qoi-luma61.run64: 2.6 4.4 148.52 89.88 675
## Benchmarking ../qoitests/images/misc/*.png -- 1 runs
## Totals (AVG) size: 0x0
decode ms encode ms decode mpps encode mpps size kb
libpng: 12.0 102.1 74.30 8.71 283
stbi: 10.4 97.2 85.39 9.14 415
qoi-exluma: 2.8 3.3 316.36 270.04 513
qoi-luma61: 2.8 3.3 322.46 272.58 504
qoi-luma61.run64: 2.4 3.2 364.61 280.07 503
## Benchmarking ../qoitests/images/textures/*.png -- 1 runs
## Totals (AVG) size: 0x0
decode ms encode ms decode mpps encode mpps size kb
libpng: 3.0 41.7 42.96 3.12 163
stbi: 2.7 23.3 48.05 5.57 232
qoi-exluma: 0.8 1.1 163.08 120.30 178
qoi-luma61: 0.7 1.1 181.62 115.97 179
qoi-luma61.run64: 0.6 1.1 201.48 119.41 179
## Benchmarking ../qoitests/images/screenshots/*.png -- 1 runs
## Totals (AVG) size: 0x0
decode ms encode ms decode mpps encode mpps size kb
libpng: 59.4 620.1 138.56 13.27 2219
stbi: 52.0 755.2 158.20 10.90 2821
qoi-exluma: 28.5 26.1 288.85 315.97 2476
qoi-luma61: 27.2 27.3 302.62 301.02 2478
qoi-luma61.run64: 23.9 27.1 344.47 303.73 2475
## Benchmarking ../qoitests/images/wallpaper/*.png -- 1 runs
## Totals (AVG) size: 0x0
decode ms encode ms decode mpps encode mpps size kb
libpng: 191.4 2922.5 48.97 3.21 9224
stbi: 220.4 1814.6 42.53 5.16 13299
qoi-exluma: 76.9 85.8 121.91 109.21 9923
qoi-luma61: 75.4 94.0 124.24 99.72 9965
qoi-luma61.run64: 64.2 85.0 146.09 110.21 9964
Very cool. Reducing the index to 61 is rather clever! And I'd really like to know why a 64 run-length would help that much with performance.
@vsonnier this should be fine, as long as the decoder wraps these values around, too.
Edit: I can't really reproduce these throughput numbers. qoi-luma61.run64 performs slightly worse on my machine.
# Grand total for images
decode ms encode ms decode mpps encode mpps size kb rate
qoi-exluma: 1.9 2.7 244.79 172.72 465 25.7%
qoi-luma61.run64: 1.9 2.7 239.50 169.05 463 25.6%
Compression is better though. Images with alpha suffer quite a bit when the hash doesn't include alpha value. Might be worse to change it to:
#define QOI_COLOR_HASH(C) (C.rgba.r * 3 + C.rgba.g * 5 + C.rgba.b * 7 + C.rgba.a * 11)
(or just C.v % 61
as you proposed)
Some numbers vs qoi-demo10. qoi-experi is commit 28954f7. qoi-l61r64 is the luma61.run64 mentioned above.
For compressed size, qoi-demo10 wins on icon_512, icon_64 and pngimg, which all involve alpha. qoi-experi wins everywhere else, including on textures_plants (which also involves alpha). I know simplicity of format is a goal, but perhaps consider having two slightly different sets of opcodes (or opcode bit assignments) depending on whether channels == 3 or channels == 4. Edit: now that I've added the luma61.run64 numbers, the qoi-demo10 vs qoi-l61r64 gap is noticably smaller on icon_512, icon_64 and pngimg.
One more quick request is to raise QOI_PADDING from 4 to 7, so that the decoder can always do an 8 byte (64 bit) read.
decode ms encode ms decode mpps encode mpps size kb rate
## Total for images/icon_512
libpng: 5.0 33.9 52.74 7.74 51 5.0%
stbi: 3.9 32.8 66.47 8.00 69 6.8%
qoi-experi: 1.5 1.2 180.47 224.24 102 10.1%
qoi-demo10: 0.8 1.2 309.23 218.27 76 7.5%
qoi-l61r64: 1.0 1.4 254.71 190.54 86 8.4%
## Total for images/icon_64
libpng: 0.2 0.9 23.26 4.35 3 23.6%
stbi: 0.1 0.9 28.05 4.45 4 27.9%
qoi-experi: 0.0 0.0 103.12 106.03 5 31.3%
qoi-demo10: 0.0 0.0 148.43 110.04 4 27.6%
qoi-l61r64: 0.0 0.0 141.35 106.50 4 28.8%
## Total for images/photo_kodak
libpng: 15.2 295.7 25.88 1.33 717 46.7%
stbi: 16.9 157.8 23.33 2.49 979 63.8%
qoi-experi: 5.7 6.7 69.14 58.89 671 43.7%
qoi-demo10: 4.1 7.1 95.72 55.76 772 50.3%
qoi-l61r64: 4.5 6.6 87.26 59.26 675 44.0%
## Total for images/photo_tecnick
libpng: 44.9 1143.8 32.08 1.26 2414 42.9%
stbi: 57.9 608.3 24.88 2.37 3533 62.8%
qoi-experi: 20.9 26.3 68.99 54.67 2527 44.9%
qoi-demo10: 13.0 25.1 111.04 57.44 2737 48.7%
qoi-l61r64: 17.2 25.3 83.93 57.02 2529 45.0%
## Total for images/photo_wikipedia
libpng: 42.9 865.8 25.28 1.25 2046 48.3%
stbi: 54.6 456.6 19.87 2.38 2893 68.3%
qoi-experi: 15.7 19.9 69.17 54.39 2102 49.6%
qoi-demo10: 9.5 19.2 113.99 56.49 2289 54.0%
qoi-l61r64: 12.8 19.3 84.51 56.30 2103 49.7%
## Total for images/pngimg
libpng: 56.4 533.3 32.08 3.39 1201 17.0%
stbi: 59.6 399.9 30.35 4.52 1751 24.8%
qoi-experi: 15.5 15.7 116.99 114.90 1445 20.5%
qoi-demo10: 9.0 15.7 201.83 114.98 1429 20.2%
qoi-l61r64: 11.7 16.4 154.61 110.63 1437 20.3%
## Total for images/screenshot_game
libpng: 18.6 216.7 34.08 2.92 448 18.1%
stbi: 20.9 150.9 30.24 4.19 634 25.7%
qoi-experi: 6.3 6.3 101.13 100.93 519 21.0%
qoi-demo10: 4.1 6.1 156.06 104.53 535 21.7%
qoi-l61r64: 4.9 6.5 130.18 97.85 517 20.9%
## Total for images/screenshot_web
libpng: 92.0 1060.7 88.33 7.66 2402 7.6%
stbi: 78.8 1210.3 103.12 6.71 3076 9.7%
qoi-experi: 50.5 38.8 160.98 209.22 2649 8.3%
qoi-demo10: 28.3 40.4 287.55 201.06 2680 8.4%
qoi-l61r64: 37.7 44.7 215.30 181.93 2649 8.3%
## Total for images/textures_photo
libpng: 39.6 725.0 26.45 1.45 1977 48.3%
stbi: 46.6 370.7 22.49 2.83 2554 62.4%
qoi-experi: 13.9 15.8 75.30 66.35 1981 48.4%
qoi-demo10: 10.1 18.3 104.22 57.40 2506 61.2%
qoi-l61r64: 11.2 15.1 93.39 69.39 1990 48.6%
## Total for images/textures_pk
libpng: 0.9 24.6 47.24 1.81 89 51.5%
stbi: 0.8 17.0 55.03 2.62 121 70.0%
qoi-experi: 0.7 0.8 61.19 58.01 75 43.5%
qoi-demo10: 0.6 0.7 77.52 60.26 78 45.1%
qoi-l61r64: 0.6 0.8 79.75 58.52 75 43.3%
## Total for images/textures_pk01
libpng: 5.3 66.5 24.65 1.95 163 32.3%
stbi: 5.1 37.1 25.61 3.50 232 45.8%
qoi-experi: 1.6 1.7 81.33 77.75 178 35.2%
qoi-demo10: 1.0 1.6 136.54 81.54 180 35.6%
qoi-l61r64: 1.1 1.6 116.92 78.89 179 35.3%
## Total for images/textures_pk02
libpng: 11.7 205.0 25.97 1.48 427 36.1%
stbi: 12.1 100.1 25.07 3.03 623 52.5%
qoi-experi: 4.2 4.4 71.95 69.25 479 40.4%
qoi-demo10: 2.8 4.3 108.01 70.79 492 41.5%
qoi-l61r64: 3.1 4.3 96.66 70.61 481 40.5%
## Total for images/textures_plants
libpng: 31.9 340.6 33.35 3.12 857 20.6%
stbi: 30.7 241.5 34.66 4.41 1191 28.7%
qoi-experi: 8.0 9.2 132.21 116.02 922 22.2%
qoi-demo10: 4.3 9.5 247.12 111.57 957 23.0%
qoi-l61r64: 6.0 9.4 177.48 113.29 922 22.2%
# Grand total for images
libpng: 13.5 187.9 34.47 2.47 423 23.3%
stbi: 14.7 121.4 31.53 3.82 601 33.2%
qoi-experi: 4.7 5.0 98.37 92.79 465 25.7%
qoi-demo10: 3.0 4.9 156.82 94.45 484 26.7%
qoi-l61r64: 3.7 5.1 127.03 91.58 463 25.6%
decode ms encode ms decode mpps encode mpps size kb rate
This was on a mid-range x86_64 laptop (2016, Skylake):
$ cat /proc/cpuinfo | grep model.name | uniq
model name : Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz
Regarding a 7-Byte padding: wouldn't it make sense to put and "end-marker" there, instead of just 0x00
bytes?
If we then use 0xfd = QOI_RGB
and 0xfe = QOI_RGBA
we could pad the stream with 0xff
bytes. Since QOI dosn't allow for more than 4 consecutive 0xff
bytes anywhere in the stream, this could be a nice replacement for the now-missing size
header, when streaming images.
Edit: thinking about it some more, a 7-byte 0x00
shouldn't occur in a stream either, since repeated QOI_INDEX
chunks should be encoded as QOI_RUN
instaed.
Further changes in experimental:
QOI_COLOR_HASH
now includes the alpha channel: r * 3 + g * 5 + b * 7 + a * 11
- this helps quite a bit with the icons and pngimg from the test suite.colorspace
in the header is now an enum, where 0 = sRGB with linear alpha, 1 = all channels linear. It seems any other channel configuration does not make much sense (see https://github.com/Tom94/tev/pull/148). The 0 value for sRGB provides a sensible default.0x00
bytes. These also indicate "end of image" to alleviate the absent size
header.Just adding a size field to the header would be far mor efficient than making people who want to skip the image iterate every byte looking for a run of four 0xFF or eight 0x00.
We've had this discussion (see https://github.com/phoboslab/qoi/issues/28#issuecomment-980619128) and I believe the benefit of allowing a "streaming" encoder that does not need seeking outweigh the drawbacks of not having the size in the header.
What about allowing encoders to put 0
as the size if they don't want it? (I.E. if a decoder reads a size of zero, it has to scan, but if it sees a nonzero size it can skip?)
Yeah if the encoder absolutely needs to never go back it can just put a zero, but that's probably very rare. More likely these are files being put on disk, and you can just adjust an earlier part of the file once the compression is completed.
What about allowing encoders to put 0 as the size if they don't want it? (I.E. if a decoder reads a size of zero, it has to scan, but if it sees a nonzero size it can skip?)
That would completly defeat the idea of a streaming reader as you then don't know the file size beforehand, but if you want to skip images in a stream, you would need to have a defined header. So you can also just leave it out.
Yeah if the encoder absolutely needs to never go back it can just put a zero, but that's probably very rare. More likely these are files being put on disk, and you can just adjust an earlier part of the file once the compression is completed.
It would make it harder for, let's say, embedded devices that have low memory anyways to require the device to store the image both encoded and unencoded in RAM just for the sake of a stream field. Parsing/seeking the full image stream also doesn't take much time on most machines, especially if no writes are done.
A stronger use case for streaming encoders is stuff like additional compression/encryption where you convert some bytes, push them into the compressor/encrypter and pull some bytes from them. All compression/encryption APIs i know provide this pattern and it would be cool to just not have to allocate additional memory for the compressed image. Especially as it might require either overallocation or reallocation which are both costly. Imho, status quo is already good enough also for special use cases. The tagging of four consecutive 0xFF is imho a good middle ground, especially as a conforming non-streaming parser doesn't have to handle that case at all (it's always a invalid)
@phoboslab: The changes for color hash and colorspace are good :+1:
That would completly defeat the idea of a streaming reader as you then don't know the file size beforehand, but if you want to skip images in a stream, you would need to have a defined header.
If the size is zero, you have to walk the bytes to find the end. If the size is non-zero you can skip that many bytes immediately.
But if there's no size at all then you also have to walk the bytes to find the end, so things are just as bad as if someone had put a size of zero.
It would make it harder for, let's say, embedded devices that have low memory anyways to require the device to store the image both encoded and unencoded in RAM just for the sake of a stream field.
Why would you store the entire encoded image in ram just to have the total encoded size? A counter of how many encoded bytes have been written would be plenty. Then you write that value down in the file after the encoding is done (or, again, if you can't go back for some reason you'd know that ahead of time and mark a size of 0 in your output).
Some reasoning on why i don't think the size field is a good idea:
If the size is zero, you have to walk the bytes to find the end. If the size is non-zero you can skip that many bytes immediately.
That very much depends. I can already see people embedded different QOI images in QOI images by specifying invalid size
fields that won't skip the full image. If you want to write safe software, you have to run the decoder anyways to see if the size
field is actually valid and can be trusted as a measure to seek forward. Otherwise people might craft malicious streams that will crash your decoder. So removing the field prevents an attack vector for maliciously crafted data.
Why would you store the entire encoded image in ram just to have the total encoded size?
I can't because my (fictional, but realistic) device has only 32k RAM. This means i can neither store the full source image (but have to stream it from somewhere) nor can i store the full output image (but have to stream it somewhere). So in that use case, it would require me to always write zero to that field.
But to implement something like seeking back to disk, you have to create two interfaces to your application: One that allows streaming encoding, and one that allows non-streaming encoding. The latter one would require a bit different API for non-generic purpose implementations and would make the general library/application harder without much benefit in general (i won't argue that there isn't a benefit, i just don't value it that much)
The only use case where such a field is from use is: When you want to skip over the QOI data and the source data is seekable (aka. disk or memory buffer). For all other use cases we either have to decode the data anyways (so no benefit from the size field) or we don't have seeking capabilities (so the size field isn't worth much either, as we have to read the data anyways and skip-decoding QOI is way faster than memory or even I/O devices).
And in case of disk i/o, we already read way more data from the disk than we actually need (read_ahead_kb
is 128kB for my primary disk), this means we can already decode 128kB of data without I/O penalty. If we seek, we will pay the additional cost for a syscall which might also take longer than just decoding the data we already got anyways
But the last word is at @phoboslab. I have stated my case and don't have more arguments here.
The decoder should of course not blindly trust a size field and simply jump that far ahead in memory. It still has to check how big the current buffer is and not jump past the end of the buffer. This is 101 level stuff. However, if we're speaking about hostile input there's nothing ensuring that the input will ever signal an end of the stream, so it could trick a simplistic decoder into some sort of overflow that way as well.
And I would say that "the source data is seekable" is the overwhelmingly common case. Most things are on disk or are pulled completely into memory via network. Usually you've got seekable data.
I initially put the size field in the header in the first version, because I was immensely frustrated that you absolutely can not know if you found the end of a frame in MPEG-TS. A simple END_OF_DATA
would have solved this.
I do not like the optional size header. I goes against my desire for simplicity. All these "maybe it's like that, but maybe not... you have to check" things is what makes PNG, JPEG and many other formats way more complicated than they should be. So far, I tried hard to avoid all these optional features.
Anyway, what's the real use-cases for when you need to skip a QOI image? 1) you want to have some custom data after the image 2) you have a bunch of QOI files lumped together 3) you receive a stream of QOI images
Solutions: 1) put a fixed size trailer at the end of the file, that tells you where to find your custom data 2) use a container format that provides a size field or index of each image 3) prefix each image with a size field in the stream. If this is a video stream, you need a custom header anyway to denote framerate and whatnot
Then why not also get rid of width and/or height?
You could always get one by dividing the final number of decoded pixels by the other.
Right now you already have to check that there aren't too many/few pixels encoded inside a stream for a given size, how would an encoded byte size be different?
And wouldn't a format that doesn't require you to use some external (and likely needlessly complicated and non-standard) addon to actually use it for a big portion of its use-cases, just because its header is missing one simple value, be the simpler one?
Then why not also get rid of width and/or height? You could always get one by dividing the final number of decoded pixels by the other.
This is a good question! For one, we need a single control mechanism for stream integrity. Width and Height are usually information of interest, in a header, even if you don't care for the data around (think of the file
tool which can show you image sizes). As the QOI stream doesn't have a end of file signal, we need other means to compute that.
Right now you already have to check that there aren't too many/few pixels encoded inside a stream for a given size, how would an encoded byte size be different?
You have to check if your stream is still in the image anyways, so a secondary buffer that might even be wrong (or optional) doesn't help here. It also reduces decoding speed (my implementation was 10% faster for not checking that additional value :open_mouth:). Also you gave a cool idea:
Proposal:
If the stream ends with less than width×height
pixels decoded, the remaining pixels are filled with the last pixel value. This might bring a huge reduction for image files like logos, icons, … which have a "bar" at the bottom. This is cheap optimization for files or known-size memory buffers.
And wouldn't a format that doesn't require you to use some external (and likely needlessly complicated and non-standard) addon to actually use it for a big portion of its use-cases, just because its header is missing one simple value, be the simpler one?
If you really want that field to exist, it must be non-optional and thus remove the possibility of stream-encode QOI images. This is a huge decision to have either/or as a feature. If it's non-optional, people will just put a 0 there, because they are lazy.
Your proposal isn't a huge optimization. It saves a maximum of 2 bytes, since a single QOI_RUN_16 tag will handle this.
Then why not also get rid of width and/or height?
Non sequitur. The width/height is mandated to be always present; a decoder can rely on these.
actually use it for a big portion of its use-cases
Which are?
It saves a maximum of 2 bytes, since a single QOI_RUN_16 tag will handle this
There is no QOI_RUN_16 anymore (in the experimental branch). The longest run is now 62. But your point still stands and I'd rather be explicit: if there's pixels in the image, they need to be encoded.
I was relying on the assumption that by the time we finalize, we will either have special handling for multiple QOI_RUN_8 tags or a QOI_RUN_16/QOI_RUN_24 with an 8 bit tag that would deal with this.
@MasterQ32 The field being optional means any valid data stream with it has to also be valid without it! You can, in fact, just completely ignore it if your use case doesn't need/can't use it!
IT DOESN'T PREVENT STREAMING IN ANY WAY, SHAPE, OR FORM WHATSOEVER!
It not strictly being needed, I can understand. But saying that it would completely prevent streaming is just nonsense.
@phoboslab
Non sequitur. The width/height is mandated to be always present; a decoder can rely on these.
Exactly my point! Every single argument for size being unreliable also makes width and height be unreliable as well, but they are obviously needed and (in one way or another) already being checked, so size, which doesn't actually need to be checked or even considered when not needed, should be even less problematic.
actually use it for a big portion of its use-cases
Which are?
Any time you want to decode more than one image at once (parallel), without using custom/third-party add-ons like additional headers or indices.
F.e. You have a large number of small images (game textures?) and want to avoid the overheads of both working with lots of small files or having to fully decode a single enormous meta-file to find a specific image. You could use some kind of archive, but that would either be completely custom and non-standard or third-party and too general/overly complicated. With a size field, you could simply pack them one after the other and only need to iterate headers while staying fully within the vanilla spec.
I agree that it is more of a (very) nice to have than an absolute necessity.
The main reason I'm still arguing is because of the reason for its exclusion. Because the statement that it would prevent streaming is one which I, as stated above, strongly disagree with.
If you decide to not include a size field because of complexity, then that is ultimately fine.
Well yes, there is no mandatory size field because it would prevent streaming encode. There is no optional size field because I don't like the idea of it being optional.
you could simply pack them one after the other
That's a rather constructed scenario. Not having an index in this case seems like a bad idea. Even Doom WADs have one :)
I was relying on the assumption that by the time we finalize, we will either have special handling for multiple QOI_RUN_8 tags or a QOI_RUN_16/QOI_RUN_24 with an 8 bit tag that would deal with this.
It's not needed. A run length of 62 is sufficient. The benefits of not having variable-width chunks (or fewer tag types) far outweigh the gain in compression rate. I just tested it again: screenshot_web
has the biggest(!) win in compression rate, being 0.2% better than current experimental. It does even less for textures_*
, icon_*
and pngimg
and absolutely nothing for photos.
## Total for images/textures_photo
decode ms encode ms decode mpps encode mpps size kb rate
ex: 0 8.0 0 130.28 1981 48.4%
ex.r16: 0 8.6 0 122.39 1981 48.4%
## Total for images/textures_pk01
ex: 0 0.9 0 150.01 178 35.2%
ex.r16: 0 0.9 0 141.63 178 35.1%
## Total for images/screenshot_game
ex: 0 3.1 0 201.80 519 21.0%
ex.r16: 0 3.5 0 179.90 516 20.9%
## Total for images/textures_pk
ex: 0 0.4 0 115.40 75 43.5%
ex.r16: 0 0.4 0 106.81 75 43.5%
## Total for images/textures_pk02
ex: 0 2.2 0 135.83 479 40.4%
ex.r16: 0 2.4 0 126.77 478 40.4%
## Total for images/icon_64
ex: 0 0.0 0 252.73 4 28.7%
ex.r16: 0 0.0 0 221.30 4 28.7%
## Total for images/icon_512
ex: 0 0.6 0 468.22 85 8.4%
ex.r16: 0 0.7 0 386.56 83 8.2%
## Total for images/photo_kodak
ex: 0 3.4 0 116.76 671 43.7%
ex.r16: 0 3.7 0 107.62 671 43.7%
## Total for images/textures_plants
ex: 0 4.5 0 235.19 922 22.2%
ex.r16: 0 5.1 0 208.62 914 22.0%
## Total for images/screenshot_web
ex: 0 18.6 0 437.47 2649 8.3%
ex.r16: 0 23.4 0 347.76 2576 8.1%
## Total for images/pngimg
ex: 0 7.7 0 235.06 1436 20.3%
ex.r16: 0 8.7 0 207.12 1422 20.1%
## Total for images/photo_tecnick
ex: 0 13.3 0 108.04 2527 44.9%
ex.r16: 0 14.3 0 100.99 2527 44.9%
## Total for images/photo_wikipedia
ex: 0 10.2 0 106.29 2102 49.6%
ex.r16: 0 10.9 0 99.41 2102 49.6%
# Grand total for images
ex: 0 2.5 0 185.77 463 25.6%
ex.r16: 0 2.8 0 167.49 461 25.5%
F.e. You have a large number of small images (game textures?) and want to avoid the overheads of both working with lots of small files or having to fully decode a single enormous meta-file to find a specific image. You could use some kind of archive, but that would either be completely custom and non-standard or third-party and too general/overly complicated. With a size field, you could simply pack them one after the other and only need to iterate headers while staying fully within the vanilla spec.
Your use case for a size field is basically to have a slightly more compact tarball format that can only store qoi images and without any metadata like filename so you'll need custom nonsense to find a specific image anyway. Better to just use a tarball IMO. Tar is the standard that should have been instead of every game company delighting in creating their own format, but it's not universally suitable thanks to how it works. In an ideal world there'd be a second standard archive format for compact indexed archival (a binaryily-searchable index of UTF-8 filenames with uint64_t LE file sizes, plus a file count and that's it), but there isn't AFAIK.
I'm a bit worried about the demotion of alpha here. I would like to use the format to encode anti-aliased logos and icons. In this sort of file you often don't have very many different colors, but at the edge it fades from opaque to transparent, and back.
For an example, take a look at the blue triangle near the top of http://www.theinformedillustrator.com/2021/01/smooth-operations-anti-aliasing.html
In this case I was counting on using QOI_COLOR with only the a bit set, which would cost only 2 bytes for each change of alpha. In the new format that will be 5 bytes for each change of alpha, or 10 bytes for a typical edge where the alpha goes 0->intermediate->255 or 255->intermediate->0.
Do you have images like this in your test suite?
There's still QOI_INDEX
to handle few distinct colors. textures_plants/
and pngimg/
have anti-aliased alpha. Compression is still quite ok. But for palleted images with very few colors, QOI is not the best choice anyway.
I took the parrot from https://en.wikipedia.org/wiki/File:Creative-Tail-Animal-parrot.svg, rendered as a 192x192 PNG, and the Youtube logo from the Templarian collection, rendered at 96x96. Here are the results:
-rw-rw-r-- 1 erik erik 10861 Dec 8 18:57 Creative-Tail-Animal-parrot.svg.png -rw-rw-r-- 1 erik erik 9590 Dec 8 20:53 parrot-experimental.qoi -rw-rw-r-- 1 erik erik 9019 Dec 8 20:53 parrot-main.qoi
-rw-rw-r-- 1 erik erik 1565 Dec 8 20:54 youtube-experimental.qoi -rw-rw-r-- 1 erik erik 942 Dec 8 20:55 youtube-main.qoi -rw-rw-r-- 1 erik erik 1466 Dec 8 20:50 youtube.png
They both got worse - quite a bit worse in the case of the youtube logo.
A third 8 bit tag QOI_OP_COLOR_A might be useful. I tried it on the old data set but it was only used something like 0.05% of the time there.
If we use a "mod 61" hash, we could assign the other 3 opcodes in that 64-opcode block as:
Alternatively, r4g4b4a4 could be r3g3b2a8 or s2g4c2a8, where s=r-g and c=b-g.
Or, another crazy (and probably too-complicated) idea: separate into two 'planes'. Encode all the RGB data first, using pretty much the scheme so far minus any alpha-related opcodes. Encode the A channel (if present) after that, with a separate bytecode scheme (or even just a basic RLE compression).
I completely agree with erikcorry and nigeltao, removing alpha deltas makes it almost pointless to support the alpha channel in qoi, since it will compress really poorly. Maybe we can still somehow add another opcode similar to the old QOI_DIFF_24 like nigeltao suggested?
removing alpha deltas makes it almost pointless to support the alpha channel in qoi, since it will compress really poorly
This is absolutely not the case.
## Total for images/pngimg
decode ms encode ms decode mpps encode mpps size kb rate
libpng: 30.2 292.0 59.96 6.19 1201 17.0%
stbi: 27.2 234.7 66.50 7.71 1751 24.8%
qoi.master: 6.9 10.5 262.50 172.09 1415 20.0%
qoi.exp: 6.2 7.8 290.55 231.44 1436 20.3%
A quick experiment, starting with luma61-run64. Call the original l61r64 and this experiment l61v02.
What's new is using two of the previously-unused 8-bit opcodes (QOI_OP_Z comes from the QOI_OP_DIFF block, QOI_OP_A comes from the QOI_OP_INDEX block). For opaque source images (with a = 0xFF everywhere), these two opcodes won't be used, so there's no difference in the output .qoi file for l61r64 and l61v02.
#define QOI_OP_INDEX 0x00 // 00xxxxxx +0 bytes i6
#define QOI_OP_DIFF 0x40 // 01xxxxxx +0 bytes r2g2b2
#define QOI_OP_LUMA 0x80 // 10xxxxxx +1 bytes g6s4c4 (s=r-g, c=b-g)
#define QOI_OP_RUN 0xc0 // 11xxxxxx +0 bytes n6
#define QOI_OP_Z 0x6a // 01101010 +0 bytes transparent black (all 0)
#define QOI_OP_A 0x3d // 00111101 +1 bytes a8
#define QOI_OP_RGB 0x3e // 00111110 +3 bytes r8g8b8
#define QOI_OP_RGBA 0x3f // 00111111 +4 bytes r8g8b8a8
On @erikcorry's two images:
10861 parrot.png
9019 parrot-master.qoi
9590 parrot-experi.qoi
9553 parrot-l61r64.qoi
9202 parrot-l61v02.qoi
1466 youtube.png
942 youtube-master.qoi
1565 youtube-experi.qoi
1593 youtube-l61r64.qoi
930 youtube-l61v02.qoi
On the non-opaque sub-directories of @phoboslab's test suite (the "size kb" is the important column; the mpps numbers could be optimized):
decode ms encode ms decode mpps encode mpps size kb rate
## Total for images/icon_512
qoi-master: 1.5 1.5 170.74 173.94 80 7.8%
qoi-experi: 1.5 1.2 180.47 224.24 102 10.1%
qoi-l61r64: 1.0 1.4 254.71 190.54 86 8.4%
qoi-l61v02: 1.2 1.3 226.15 206.77 79 7.7%
## Total for images/icon_64
qoi-master: 0.0 0.0 117.66 99.81 4 28.3%
qoi-experi: 0.0 0.0 103.12 106.03 5 31.3%
qoi-l61r64: 0.0 0.0 141.35 106.50 4 28.8%
qoi-l61v02: 0.0 0.0 133.57 103.37 4 26.9%
## Total for images/pngimg
qoi-master: 16.8 19.4 107.53 93.19 1415 20.0%
qoi-experi: 15.5 15.7 116.99 114.90 1445 20.5%
qoi-l61r64: 11.7 16.4 154.61 110.63 1437 20.3%
qoi-l61v02: 13.6 16.8 133.24 107.41 1422 20.1%
## Total for images/textures_pk02
qoi-master: 4.4 5.5 68.97 55.69 504 42.5%
qoi-experi: 4.2 4.4 71.95 69.25 479 40.4%
qoi-l61r64: 3.1 4.3 96.66 70.61 481 40.5%
qoi-l61v02: 3.6 4.5 85.54 66.95 480 40.5%
## Total for images/textures_plants
qoi-master: 8.9 11.5 120.21 92.40 951 22.9%
qoi-experi: 8.0 9.2 132.21 116.02 922 22.2%
qoi-l61r64: 6.0 9.4 177.48 113.29 922 22.2%
qoi-l61v02: 7.2 9.8 148.25 108.32 915 22.0%
[other images/* results not shown]
# Grand total for images
qoi-master: 5.0 6.1 92.49 76.72 485 26.8%
qoi-experi: 4.7 5.0 98.37 92.79 465 25.7%
qoi-l61r64: 3.7 5.1 127.03 91.58 463 25.6%
qoi-l61v02: 4.1 5.2 112.78 89.77 462 25.5%
decode ms encode ms decode mpps encode mpps size kb rate
What's new is using two of the previously-unused 8-bit opcodes (QOI_OP_Z comes from the QOI_OP_DIFF block, QOI_OP_A comes from the QOI_OP_INDEX block).
I've been guilty of it in earlier iterations but have come to the conclusion that overlapping encodings should be avoided (unless it's nailed down in the spec that one should be used). By that I mean your encoder might emit QOI_OP_RUN instead of QOI_OP_DIFF, but a different encoder might not so a QOI_OP_DIFF representing no change should not have an overlapping encoding. The 3 tags at the end of a 61 value index are fair game because no conforming encoder should use them for anything else.
There is a replacement for QOI_OP_DIFF that compresses the corpus better and only uses 63 values, so in the above case there would be space for QOI_OP_Z. Lets call it QOI_OP_LUMA1. It's a 6 bit mashup of LUMA's vg_r/vg_b and ANS coding the values (like GDELTA) so we can use ranges other than powers of 2 (which we do to increase green's range and better fit around 0). vg_r=-1..1, vg=-3..3, vg_b=-1..1. Apologies for lack of example but the code below should be relatively plug and play. The last value is unused.
Encode
if (
vg_r > -2 && vg_r < 2 &&
vg > -4 && vg < 4 &&
vg_b > -2 && vg_b < 2
) {
bytes[p++] = QOI_OP_LUMA1 | (((vg_b + 1) * 21) + ((vg_r + 1) * 7) + (vg + 3));
}
Decode
else if ((b1 & QOI_MASK_2) == QOI_OP_LUMA1) {
b1 ^= QOI_OP_LUMA1;
int vg = (b1 % 7) - 3;
b1/=7;
px.rgba.r += vg - 1 + (b1 % 3);
px.rgba.g += vg;
b1/=3;
px.rgba.b += vg - 1 + (b1 % 3);
}
Just changing the latest experi commit from DIFF to LUMA1 does this
# Grand total for ../qoi_benchmark_suite/images
qoi-ex2103: 2.3 3.2 199.36 144.40 463 25.6%
qoi-gd8b: 2.8 3.4 168.58 138.35 459 25.4%
There are similar gains to be had giving LUMA the ANS treatment (but smaller, the 8 bit version is more critical) but that's bikeshed territory for now.
edit: It may look like processing speed is tanked but perhaps not much when optimised, delta7 in the bikeshed uses a similar but wider 7 bit LUMA/ANS encoding and still achieves decent speeds.
edit: Actually as a mix of luma and ans it should be called lama
It may look like processing speed is tanked but perhaps not much when optimised, delta7 in the bikeshed uses a similar but wider 7 bit LUMA/ANS encoding and still achieves decent speeds.
In case anyone else is wondering, here's a link to delta7 code and benchmark numbers.
Edit: thinking about it some more, a 7-byte
0x00
shouldn't occur in a stream either, since repeatedQOI_INDEX
chunks should be encoded asQOI_RUN
instaed.
Is this an explicit condition? I recall reading somewhere around here that an encoder should be free to make any kind of decision about how to encode the data as long as valid opcodes are used (an extreme case would be an encoder emitting QOI_OP_RGB
for each pixel. Stupid but technically correct). Looks like 8 zeros in a row would be a valid QOI opcode sequence, even if useless. What do you think?
Speaking of padding, wouldn't requiring the total size to be a multiple of 8 bytes zero padded be better than always outputting 7 zero bytes? Making the header 16 bytes should also eliminate unaligned loads and/or make handling slightly simpler (wouldn't it? I don't know much about the nitty gritty of optimising).
If the above is true why not go one step further and require total size to be a multiple of 32 so SIMD can make some safe optimising assumptions (in particular for AVX2, but many SIMD are in this sort of range or variable which should still benefit)?
There is nearly nothing that can be made SIMD about this format, other than maybe a memcpy.
Is this an explicit condition? I recall reading somewhere around here that an encoder should be free to make any kind of decision about how to encode the data as long as valid opcodes are used (an extreme case would be an encoder emitting
QOI_OP_RGB
for each pixel. Stupid but technically correct). Looks like 8 zeros in a row would be a valid QOI opcode sequence, even if useless. What do you think?
Technically correct. I'm hesitant to sacrifice one more bit of QOI_INDEX
for the end-code, though. So I guess we should just make that explicit: "Consecutive QOI_INDEX chunks with the same index value are illegal and must be encoded as a QOI_RUN instead."
Or we switch the QOI_RUN
and QOI_INDEX
tags and make the 61 values wide as proposed by @chocolate42 - which after some testing had no compression benefit and is somewhat aesthetically less pleasing to me :/
Edit: Narf. Specifying that an encoder must not produce consecutive QOI_INDEX
chunks would actually not be enough to have a distinct "ending tag". Even one QOI_INDEX: 0
may be misleading: If a stream ends with a QOI_INDEX: 0
and is then followed by the 8-bytes padding, a decoder that attempts to split QOI files will split one byte early.
Prime numbers are always aesthetic ;)
Powers of two are useful for performance and for some ops it makes sense, IMO index and run are two ops where powers of two do not make sense. I'm experimenting with grouping run/index/small-ops into a single mask to optimise the opcode distribution. How does an index size of 47 tickle your fancy?
I'm generally against introducing more opcodes if they do not increase compression by a lot. Without refactoring QOI to use some notion of blocks and/or more complicated encoding schemes, we will not beat a well optimized PNG anyway. QOI is all about simplicity with a reasonable compression rate.
Thinking about the end-code some more: even if the padding is specified to be an otherwise unused 8-bit tag a decoder may still split QOI files too early if they just happen to end with a QOI_RGBA
where the color value(s) match this tag. So the end-code must be a multi-byte sequence that can not occur within the stream 🤔
I think simplicity is maybe the wrong metric to aim for. You only implement the encoder/decoder some small number of times.
Fast operations with decent compression seems like a better goal point. So whatever makes things go fast without hurting compression should probably be favored. Eg: using powers of 2 over prime numbers.
I think simplicity is maybe the wrong metric to aim for. You only implement the encoder/decoder some small number of times.
If we go that route, we'll end up PNG in the end. As you can always argue that something might improve compression rate. Simplicity is nice as you can use QOI to explain people how some basic compression works in general and you can easily proof correctness of implementation.
I prefer simple and small libs with marginal drawbacks to complexer ones nowadays because i can manage and maintain them. I don't wanna maintain a libpng, but libqoi is just so tiny, i can do that
You say that, but PNG does at least two things that are not helpful for compression ratio or decompression speed. Possibly more. It's really not so great, it's just what we all use at the moment.
As you can always argue that something might improve compression rate.
I would expect some concrete evidence if someone was making a specific claim of increased compression or better decompression speed.
The example of powers of two compared to primes is just an avenue to investigate, I don't think anyone was trying to advocate a specific spec change without evidence.
With all that we learned through the analysis and ideas of a lot of people here, I refined QOI quite a bit. More than I thought I would.
The current state is in the experimental branch.
First of all, benchmark results for the new test suite using
qoibench 1 images/ --nopng --onlytotals
As you can see throughput improved a lot, as did the compression ratio for all files without an alpha channel (
icon_*/
andpngimg/
suffered a bit, but the overall compression ratio for these files is already quite high.textures_plants/
still saw improvements). For photos or photo-like images QOI now often beats libpng!What changed? After I switched the tags for
QOI_RUN
(previously 2-bit tag) andQOI_GDIFF_16
(previously 4-bit tag) I noticed thatQOI_GDIFF
covered almost all(!) cases that were previously encoded byQOI_DIFF_16/24
. So... why not remove them?(see the experimental file format documentation for the details)
That is, most tags are now 2-bit, while the run-length is limited to 62 and thus leaves some room for the two 8-bit
QOI_OP_RGB
andQOI_OP_RGBA
tags. So QOI would be even simpler than before and (probably?) gain a lot more possibilities for performance improvements:Yes, it means that a change in the alpha channel will always be encoded as a 5-byte
QOI_OP_RGBA
, but using the current test suit of images, this seems to be totally fine. The alpha channel is mostly either 255 or 0. The famousdice.png
and FLIF'sfish.png
seem to be awfully "artificial" uses of PNG. (For comparison, in the experimental branch with the original tag-layout andQOI_DIFF_16/24
still present, the overal compression ratio was at 24.6% - but the win in simplicity and performance is imho worth this 1%).The hash function changed to the following:
This is seriously the best performing hash function I could find and I tried quite a few. This also ignores the alpha channel, making it even more of a second-class citizen.
You may not like it (and I'm truly sorry for all the work that would need to be done in existent implementations), but I strongly believe that this is The Right Thing To Do™.
Thoughts?