Open nigeltao opened 2 years ago
Tangentially, if you're using Wuffs from C++ (instead of C), you can replace
wuffs_png__decoder__set_quirk_enabled(pDec, etc);
with
pDec->set_quirk_enabled(etc);
and similarly for other Wuffs' C API calls.
Cool- I will hook this up to see how wuffs compares.
I disabled Wuffs checksum checking. Also I updated the README. I didn't conduct exhaustive benchmarking yet, but I will. So far Wuffs's PNG decoder is extremely fast.
fpng seems a bit faster (within 10-20%, depending on whether you disable Wuffs checksums), but it's not a general purpose decoder. FPNG is really just existence proof and a push to others to make their PNG libraries faster. The real competition is QOI.
The real competition is QOI.
Also tangential, but if you haven't already seen it, https://github.com/nigeltao/qoi2-bikeshed/issues/28 has some numbers for unofficial, experimental QOI variants. The highlights:
libpng
is... libpng 1.6.37.qoi-mas-c04
is "Official QOI": commit c04a975 from the phoboslab/master branch (2021-12-24).qoi-opt-c04
is the same implementation (and file format), but the decoder was optimized. The encoder is unchanged.qoi-lz4-d28
tweaks the QOI opcodes and also introduces LZ4 compression.qoibench.c
numbers on two test suites:
decode ms encode ms decode mpps encode mpps size kb rate
# Grand total for images
libpng: 13.5 153.7 34.44 3.02 398 24.2%
qoi-mas-c04: 4.7 5.6 99.25 82.46 463 28.2%
qoi-opt-c04: 4.2 5.7 111.40 81.60 463 28.2%
qoi-lz4-d28: 4.2 6.0 110.38 77.11 416 25.4%
# Grand total for images-lance
libpng: 103.5 925.0 39.25 4.39 1395 9.0%
qoi-mas-c04: 35.2 32.5 115.26 124.91 2109 13.6%
qoi-opt-c04: 29.0 32.5 139.87 124.88 2109 13.6%
qoi-lz4-d28: 28.7 36.2 141.62 112.13 1530 9.8%
It looks like QOI can get a 1.1x - 1.2x decode speed improvement above the current official implementation, without any file format changes. With file format changes, something QOI-like can keep that higher speed but also improve the compression ratio, closing the gap on PNG.
https://github.com/nigeltao/qoi2-bikeshed/issues/31 and https://github.com/nigeltao/qoi2-bikeshed/issues/34 also suggests that there may be further compression-ratio gains from tweaking a QOI-like file format.
The real competition is QOI.
You (@richgel999) are already mentioned in the tweet but for anyone else coming here: @veluca93 and @jonsneyers have a standalone JpegXL lossless encoder that's about 1000 lines of code (https://github.com/libjxl/libjxl/tree/f413c429cfd5f2e1f65b5ebfc0c969b67ddbcbaf/experimental/fast_lossless) and https://twitter.com/jonsneyers/status/1472959101416226823 suggests that it's 2x faster than QOI (for roughly equivalent compression ratio).
See also: https://github.com/veluca93/fpnge
Anyway, happy to see healthy competition making everyone faster. :-)
Adler-32 and CRC-32 computations are fast with SIMD, but if you just want the fastest possible PNG decoder (e.g. to compare to QOI's speed), ignoring the checksums can be even faster.
For decode (not encode, obviously), IIUC fpng already skips computing/verifying Adler-32 always and CRC-32 sometimes (depending on FPNG_DISABLE_DECODE_CRC32_CHECKS).
Some of the other libraries (lodepng and stb_image) also do this automatically. Wuffs needs to opt in, with a one-liner patch: