nigeltao / qoi2-bikeshed

"Quite OK Image" version 2 discussions
33 stars 0 forks source link

Zstandard Compression #25

Open sudoBash418 opened 2 years ago

sudoBash418 commented 2 years ago

I imagine I'm not the first one to look into this, but I haven't seen another post about this yet so I may as well throw the idea onto the playing field. Apologizes if this is the wrong place for this :sweat_smile:

On screenshots, I got anywhere from 15% to sometimes 50% on zstd levels 1-3, generally outperforming even fairly well-optimized PNGs in compression ratio. For other image types it seems to be around 10-15%.

Here's a more thorough comparison of the en.wikipedia.org.png screenshot from the original "test bench":

Compression Size Ratio
original file 1,316,655 1.000
oxipng -D -o3 1,046,134 0.794
qoi 1,498,761 1.138
qoi + zstd -1 955,221 0.725
qoi + zstd -3 809,694 0.615

Tested with the master branch of QOI (2ee2169), zstd v1.5.0, and oxipng 5.0.1.

Haven't done a complete comparison across the entire test bench yet; if there's interest I might get around to it though.

chocolate42 commented 2 years ago

It's interesting that qoi is a pure pixel engine especially as it only touches input once so can be easily streamed. Being modular allows for all the niceties of the existing ecosystem like being wrapped in a tarball for archival or compressed with whatever generic streaming compressor you like (compare that to PNG which chose a fixed compressor, a mediocre one even at the time).

When all is said and done I think it would be ideal if .qoi rivalled libpng on filesize with much faster encode/decode, .qoi.zstd (with some fast setting) rivalled optimised PNG's for filesize, and .qoi.xz (with some slow setting) rivalled JpegXL for lossless filesize. Zstd fast is important to bench when optimising for access times (amusingly, compressing with zstd fast may be quicker than a raw qoi file depending on storage medium), xz slow is important when optimising for archival.

Which is a rambling way of saying yes I am very interested in compressed benchmarks. You might want to hold off on doing so until the format has been finalised (or create the benchmark in such a way that it's easy to run again if the spec changes).

nigeltao commented 2 years ago

https://github.com/phoboslab/qoi/issues/71#issuecomment-991884414 has some numbers on wrapping with LZ4 instead of Zstd, on the entire test suite. LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

nigeltao commented 2 years ago

Apologizes if this is the wrong place for this

This is the right place!

sudoBash418 commented 2 years ago

phoboslab/qoi#71 (comment) has some numbers on wrapping with LZ4 instead of Zstd, on the entire test suite. LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

I tested LZ4 once or twice on a couple images but gave up because I thought that zstd had it beat no matter what speed/size you aim for, but I'll add it to my comparison. Looking at the numbers from zstd's own author, it seems that lz4 wins in at least decompression time, so its benefits likely will not be revealed by my script.

It's interesting that qoi is a pure pixel engine especially as it only touches input once so can be easily streamed. Being modular allows for all the niceties of the existing ecosystem like being wrapped in a tarball for archival or compressed with whatever generic streaming compressor you like (compare that to PNG which chose a fixed compressor, a mediocre one even at the time).

When all is said and done I think it would be ideal if .qoi rivalled libpng on filesize with much faster encode/decode, .qoi.zstd (with some fast setting) rivalled optimised PNG's for filesize, and .qoi.xz (with some slow setting) rivalled JpegXL for lossless filesize. Zstd fast is important to bench when optimising for access times (amusingly, compressing with zstd fast may be quicker than a raw qoi file depending on storage medium), xz slow is important when optimising for archival.

Yeah I agree, and that reminds me that I should add LZMA to the test as well, in one form or another.

Which is a rambling way of saying yes I am very interested in compressed benchmarks. You might want to hold off on doing so until the format has been finalised (or create the benchmark in such a way that it's easy to run again if the spec changes).

I'm going to try writing a shell script for testing all this; it'll be filesizes-only however because getting accurate numbers for speed is a much more difficult task.

sudoBash418 commented 2 years ago

Here are my results, using qoiconv from phoboslab/qoi@2ee2169 (master). Compressors are sorted by total filesize, descending.

Compression Results (kB) | | TOTAL | icon_512 | icon_64 | photo_kodak | photo_tecnick | photo_wikipedia | pngimg | screenshot_game | screenshot_web | textures_photo | textures_pk | textures_pk01 | textures_pk02 | textures_plants | |-------------|---------|----------|---------|-------------|---------------|-----------------|--------|-----------------|----------------|----------------|-------------|---------------|---------------|-----------------| | plain | 1351894 | 18658 | 1002 | 16502 | 258843 | 105507 | 274221 | 328421 | 37984 | 40590 | 77559 | 20646 | 115290 | 56671 | | lz4fast12 | 1312141 | 14902 | 990 | 16495 | 258744 | 105508 | 262631 | 308866 | 34795 | 40590 | 77163 | 20066 | 114830 | 56561 | | lz4fast5 | 1299434 | 13635 | 963 | 16487 | 258681 | 105508 | 260110 | 303348 | 33330 | 40572 | 76492 | 19828 | 114042 | 56438 | | lz4fast3 | 1292847 | 13043 | 946 | 16483 | 258654 | 105508 | 258928 | 300492 | 32586 | 40549 | 76084 | 19696 | 113524 | 56354 | | zstdfast5 | 1284166 | 12005 | 949 | 16500 | 258628 | 105496 | 257074 | 297238 | 31723 | 40591 | 74793 | 19458 | 113357 | 56354 | | lz4fast1 | 1283038 | 12346 | 917 | 16477 | 258613 | 105500 | 257277 | 296477 | 31586 | 40495 | 74993 | 19520 | 112615 | 56222 | | zstdfast3 | 1276492 | 11326 | 920 | 16498 | 258583 | 105489 | 255692 | 294090 | 30888 | 40591 | 73992 | 19303 | 112864 | 56256 | | lz4_1 | 1275580 | 11887 | 894 | 16471 | 258572 | 105481 | 256090 | 293549 | 30855 | 40426 | 73915 | 19397 | 111914 | 56129 | | zstdfast1 | 1266374 | 10649 | 880 | 16491 | 258521 | 105476 | 254095 | 290440 | 30077 | 40571 | 71979 | 19134 | 111975 | 56086 | | lz4_3 | 1205685 | 10281 | 873 | 15611 | 254795 | 103816 | 241933 | 271855 | 27532 | 36699 | 66498 | 18461 | 103959 | 53372 | | lz4_5 | 1202112 | 10178 | 872 | 15581 | 254577 | 103745 | 241258 | 270580 | 27399 | 36506 | 66157 | 18396 | 103599 | 53264 | | lz4_9 | 1201317 | 10143 | 871 | 15580 | 254552 | 103743 | 241145 | 270312 | 27373 | 36479 | 65936 | 18375 | 103560 | 53248 | | zstd1 | 1100532 | 10059 | 796 | 14216 | 225918 | 92452 | 220777 | 252941 | 26873 | 31394 | 65662 | 16869 | 94487 | 48088 | | zstd3 | 1074274 | 9473 | 763 | 14052 | 225600 | 92230 | 215909 | 243789 | 24851 | 31249 | 60772 | 16476 | 91715 | 47395 | | zstd9 | 1052353 | 8704 | 756 | 13848 | 224181 | 91546 | 210475 | 238618 | 23610 | 30531 | 58062 | 16134 | 89201 | 46687 | | xz1 | 1019691 | 8011 | 727 | 13270 | 220192 | 89650 | 202900 | 228139 | 22883 | 30471 | 56269 | 15524 | 86384 | 45271 | | zstd19 | 1017895 | 8225 | 740 | 13298 | 220685 | 89924 | 200601 | 229915 | 22427 | 29472 | 55717 | 15745 | 85869 | 45277 | | zstdultra22 | 1017813 | 8222 | 740 | 13298 | 220685 | 89924 | 200531 | 229908 | 22426 | 29472 | 55717 | 15745 | 85869 | 45276 | | xz3 | 1016294 | 7956 | 726 | 13265 | 219986 | 89605 | 201633 | 227615 | 22398 | 30325 | 55971 | 15496 | 86109 | 45209 | | 7z9 | 979018 | 7523 | 739 | 12911 | 213357 | 86717 | 193025 | 219147 | 21348 | 28969 | 54258 | 14976 | 82347 | 43701 | | xz6 | 978992 | 7502 | 718 | 12910 | 213359 | 86721 | 193193 | 219085 | 21347 | 28968 | 54190 | 14968 | 82333 | 43698 | | xz9 | 978811 | 7502 | 718 | 12910 | 213359 | 86721 | 193012 | 219085 | 21347 | 28968 | 54190 | 14968 | 82333 | 43698 |
Compression Results (%) | | TOTAL | icon_512 | icon_64 | photo_kodak | photo_tecnick | photo_wikipedia | pngimg | screenshot_game | screenshot_web | textures_photo | textures_pk | textures_pk01 | textures_pk02 | textures_plants | |-------------:|-------|----------|---------|-------------|---------------|-----------------|--------|-----------------|----------------|----------------|-------------|---------------|---------------|-----------------| | plain | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | | lz4fast12 | 97.06 | 79.87 | 98.84 | 99.96 | 99.96 | 100.0 | 95.77 | 94.05 | 91.6 | 100.0 | 99.49 | 97.19 | 99.6 | 99.81 | | lz4fast5 | 96.12 | 73.08 | 96.15 | 99.91 | 99.94 | 100.0 | 94.85 | 92.37 | 87.75 | 99.96 | 98.62 | 96.03 | 98.92 | 99.59 | | lz4fast3 | 95.63 | 69.9 | 94.42 | 99.89 | 99.93 | 100.0 | 94.42 | 91.5 | 85.79 | 99.9 | 98.1 | 95.4 | 98.47 | 99.44 | | zstdfast5 | 94.99 | 64.34 | 94.77 | 99.99 | 99.92 | 99.99 | 93.75 | 90.51 | 83.52 | 100.0 | 96.43 | 94.25 | 98.32 | 99.44 | | lz4fast1 | 94.91 | 66.17 | 91.59 | 99.85 | 99.91 | 99.99 | 93.82 | 90.27 | 83.15 | 99.77 | 96.69 | 94.55 | 97.68 | 99.21 | | zstdfast3 | 94.42 | 60.7 | 91.89 | 99.98 | 99.9 | 99.98 | 93.24 | 89.55 | 81.32 | 100.0 | 95.4 | 93.5 | 97.9 | 99.27 | | lz4_1 | 94.36 | 63.71 | 89.24 | 99.82 | 99.9 | 99.98 | 93.39 | 89.38 | 81.23 | 99.6 | 95.3 | 93.95 | 97.07 | 99.04 | | zstdfast1 | 93.67 | 57.08 | 87.88 | 99.94 | 99.88 | 99.97 | 92.66 | 88.44 | 79.18 | 99.95 | 92.81 | 92.67 | 97.12 | 98.97 | | lz4_3 | 89.18 | 55.1 | 87.14 | 94.6 | 98.44 | 98.4 | 88.23 | 82.78 | 72.48 | 90.42 | 85.74 | 89.42 | 90.17 | 94.18 | | lz4_5 | 88.92 | 54.55 | 87.05 | 94.42 | 98.35 | 98.33 | 87.98 | 82.39 | 72.13 | 89.94 | 85.3 | 89.1 | 89.86 | 93.99 | | lz4_9 | 88.86 | 54.36 | 86.95 | 94.42 | 98.34 | 98.33 | 87.94 | 82.31 | 72.06 | 89.87 | 85.01 | 89.0 | 89.83 | 93.96 | | zstd1 | 81.41 | 53.91 | 79.48 | 86.15 | 87.28 | 87.63 | 80.51 | 77.02 | 70.75 | 77.34 | 84.66 | 81.7 | 81.96 | 84.86 | | zstd3 | 79.46 | 50.77 | 76.17 | 85.16 | 87.16 | 87.42 | 78.74 | 74.23 | 65.42 | 76.99 | 78.36 | 79.8 | 79.55 | 83.63 | | zstd9 | 77.84 | 46.65 | 75.46 | 83.92 | 86.61 | 86.77 | 76.75 | 72.66 | 62.16 | 75.22 | 74.86 | 78.15 | 77.37 | 82.38 | | xz1 | 75.43 | 42.94 | 72.6 | 80.42 | 85.07 | 84.97 | 73.99 | 69.47 | 60.24 | 75.07 | 72.55 | 75.19 | 74.93 | 79.88 | | zstd19 | 75.29 | 44.08 | 73.91 | 80.59 | 85.26 | 85.23 | 73.15 | 70.01 | 59.04 | 72.61 | 71.84 | 76.26 | 74.48 | 79.89 | | zstdultra22 | 75.29 | 44.07 | 73.91 | 80.59 | 85.26 | 85.23 | 73.13 | 70.0 | 59.04 | 72.61 | 71.84 | 76.26 | 74.48 | 79.89 | | xz3 | 75.18 | 42.64 | 72.52 | 80.39 | 84.99 | 84.93 | 73.53 | 69.31 | 58.97 | 74.71 | 72.17 | 75.05 | 74.69 | 79.77 | | 7z9 | 72.42 | 40.32 | 73.77 | 78.24 | 82.43 | 82.19 | 70.39 | 66.73 | 56.2 | 71.37 | 69.96 | 72.53 | 71.43 | 77.11 | | xz6 | 72.42 | 40.21 | 71.7 | 78.24 | 82.43 | 82.19 | 70.45 | 66.71 | 56.2 | 71.37 | 69.87 | 72.5 | 71.41 | 77.11 | | xz9 | 72.4 | 40.21 | 71.7 | 78.24 | 82.43 | 82.19 | 70.39 | 66.71 | 56.2 | 71.37 | 69.87 | 72.5 | 71.41 | 77.11 |
Commpressor Details | Name | Command Line | |--|--| | zstdfast# | `zstd --fast=#` | | zstd# | `zstd -#` | | zstd19 | `zstd -19 -T0` | | zstdultra22 | `zstd -22 --ultra -T0` | | xz# | `xz -# -c -T0` | | 7z9 | `7z a -mx9` | #### Versions ``` zstd: v1.5.0 xz: (XZ Utils) 5.2.5 7z: p7zip Version 17.04 ```

Some quick notes:

chocolate42 commented 2 years ago

I ran the test suite through zopflipng which appears to have better compression than oxipng (even oxipng -o 6 -Z, where Z is zopflipng). It took ~30 core hours with option -m, there are more exhaustive options but it would have taken an age for minimal gains. Sorry if you're in the process of crunching PNGs, I set it running overnight on a whim. kB=1000

zopflipng -m
images/icon_512:      8598
images/icon_64:            723
images/photo_kodak:  14715
images/photo_tecnick:   209470
images/photo_wikipedia:  87123
images/pngimg:          208908
images/screenshot_game: 227834
images/screenshot_web:   26516
images/textures_photo:   31315
images/textures_pk:  41078
images/textures_pk01:    14504
images/textures_pk02:    80261
images/textures_plants:  47995
totals:                 999046

So qoi + the top compressors already beat the top optimised PNGs. Arguably the PNGs could be crunched slightly further, but then so can qoi if we go off the deep end and start using experimental state of the art compressors like cmix.

@sudoBash418 Let me know if you're going to generate lossless JpegXL data. If not I can crunch those numbers.

LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

LZ4 is too quick IMO unless only dealing with in-memory operations (which is a category I hadn't considered). zstd -1 nearly saturates the bandwidth of a typical consumer SSD with a single (modern) CPU core, so it's a better fit for quickly loading assets for example. So rather than replacing zstd -1 with LZ4 it might be best to have three main data points: LZ4, zstd -1, xz -6.

sudoBash418 commented 2 years ago

So qoi + the top compressors already beat the top optimised PNGs. Arguably the PNGs could be crunched slightly further, but then so can qoi if we go off the deep end and start using experimental state of the art compressors like cmix.

@sudoBash418 Let me know if you're going to generate lossless JpegXL data. If not I can crunch those numbers.

I wasn't planning on it; go for it. I might do a more quick-and-dirty test of "uncompressed PNG" compressed with zstd/xz just to see what happens.

LZ4 seems closer in spirit to QOI (byte-aligned ops, relatively simple format, emphasize speed over compression ratios, etc).

LZ4 is too quick IMO unless only dealing with in-memory operations (which is a category I hadn't considered). zstd -1 nearly saturates the bandwidth of a typical consumer SSD with a single (modern) CPU core, so it's a better fit for quickly loading assets for example. So rather than replacing zstd -1 with LZ4 it might be best to have three main data points: LZ4, zstd -1, xz -6.

I would be inclined to agree, but I haven't run the numbers yet so I'm not certain about what the performance would look like. One thing to note about zstd (and probably lz4, but I'm not certain) is that decompression speed and memory requirements generally stay the same even up to -19, which can be very useful in "compress once, decompress many" cases (such as game assets or static web files).

chocolate42 commented 2 years ago

Great point about using -19 instead of -1, that makes a lot more sense for that use case.

chocolate42 commented 2 years ago

JPEG XL encoder v0.7.0 335f8a8 [AVX2,SSE4,SSSE3,Scalar] cjxl -e 9 -q 100

kB=1000
images/icon_512:           4934
images/icon_64:             536
images/photo_kodak:       10169
images/photo_tecnick:    151300
images/photo_wikipedia:   65934
images/pngimg:           135264
images/screenshot_game:  163189
images/screenshot_web:    15254
images/textures_photo:    19797
images/textures_pk:       38466
images/textures_pk01:     12085
images/textures_pk02:     63698
images/textures_plants:   29439
totals:                  710071

Lossless JpegXL lives up to the hype.

magnus-ISU commented 2 years ago

While this is indeed very cool, would the specification actually require anything different?

EDIT: unless by writing a parser that does both at once, you can get more efficient code or something? I can definitely see that happening actually

nigeltao commented 2 years ago

Putting compression into the qoi2 format (and specifically leaving the magic identifier and the rest of the header uncompressed), instead of flinging around foo.qoi2.xz files around, makes it easier to tell (e.g. as part of the /usr/bin/file command) that a foo.dat file is an image (and specifically a qoi2 image of a certain width and height), instead of only knowing that it's "xz compressed data".

chocolate42 commented 2 years ago

There is some benefit to integrating entropy coding but it's a fair bit of complexity to do it. Refactoring the encode/decode functions to allow streaming would let either method be just as efficient. As long as it's optional it's fine, the worst thing we could do is enforce a particular entropy-coder which doesn't suit all use cases and quickly dates the format. If integrated entropy coding exists I think it should accept these codecs which should cover most use cases with a "none" escape hatch: None, lz4, zstd, xz/lzma.

The zstd reference implementation appears to be able to output all three, haven't had a chance to try it out yet.

edit: It's the benchmark tool in the zstd repo that can handle all three not the implementation, which makes more sense. Looks about as easy as lz4 to integrate, zstd dev files are even in major Linux repo's so it should be as easy as lz4 at least on Linux.

nigeltao commented 2 years ago

I haven't fully digested it yet, but @richgel999 recently blogged about LZ_ADD / LZ_XOR compression which might inspire some ideas for interesting QOI2 experiments.

wbd73 commented 2 years ago

What I don't like about compression is that something like a lz4 encoder might not be available for all programming languages.

oscardssmith commented 2 years ago

It exists for C, python, C#, Java, and javascript, Rust, and Go (and a ton more). Is there a specific language you're worried about?

chocolate42 commented 2 years ago

... might not be available for all programming languages.

If something exists for C it can exist for pretty much anything through C bindings. LZ4 is such a fundamental and long-lived algorithm that it's definitely everywhere and has been for decades. Same goes for LZMA and even ZSTD (which is newer but already used in a lot of fundamental things like package managers and the Linux kernel).

wbd73 commented 2 years ago

I should have read a bit more about it. Now I've read a bit more about it I see it's not really an issue.