sumatrapdfreader / sumatrapdf

SumatraPDF reader
http://www.sumatrapdfreader.org
GNU General Public License v3.0
12.97k stars 1.67k forks source link

Add support for JXL (JPEG XL) format #1943

Open eddiezato opened 3 years ago

eddiezato commented 3 years ago

A very promising format, capable of lossless jpeg transcoding.

Official website https://jpeg.org/jpegxl/ Reference software https://gitlab.com/wg1/jpeg-xl Additional info https://cloudinary.com/blog/time_for_next_gen_codecs_to_dethrone_jpeg

Already supported by Chrome Dev and Firefox Nightly under flag: https://bugs.chromium.org/p/chromium/issues/detail?id=1178058 https://bugzilla.mozilla.org/show_bug.cgi?id=1539075

Others: https://github.com/novomesk/qt-jpegxl-image-plugin https://github.com/saschanaz/jxl-winthumb

GitHubRulesOK commented 3 years ago

It may be great for server photographic/movie applications, however its not really that useful as a format for documents. Related issue https://github.com/sumatrapdfreader/sumatrapdf/issues/1520#

The sample jpegxl-logo.jxl which I cant view online =20,955 bytes As a transparent high quality lossless 32bit jpegxl-logo.png it IS bigger=41,434 bytes As an 8bit (web optimised) document jpegxl-logo.png it is much leaner=13,714 bytes and even the higher quality full 24bit jpegxl-logo.webp is only = 10,478 bytes

Personally I discourage using compressed formats since they are slower to load. They are only suited to interlaced reading online (e.g. PNG is best) and during download. Once local it is better if they are stored unencrypted and decompressed as required for instant reading/viewing, but again PNG is native to SumatraPDF.

So although it IS bigger when converted by SumatraPDF as a png.pdf= 28,435 bytes it loads more than quickly enough jpegxl-logo.pdf

SumatraPDF can also convert the webp sourced image to PDF and thus read it but the gain is less (24,559 bytes) and also a chance it might not work within all other viewers so I would discourage such use jpegxl-logo [SumatraPDF].webp.pdf

eddiezato commented 3 years ago

Once local it is better if they are stored unencrypted and decompressed as required for instant reading/viewing

Sorry, I didn't get that line. My comic books, which are stored locally, are compressed to WebP and to ZIP (CBZ). I don't think keeping these image files decompressed in BMP and just in folders would be reasonable. PNG is also a "compressed format", btw.

GitHubRulesOK commented 3 years ago

I accept that the reasonable way to combine images is to use modern alternatives to multi-page TIFF (still the long time preference in the graphics world) So accepting external Zip containerisation (15,158 bytes) is nearly as good as internal ZIP compression of TIFF (14,108 bytes) then multiple non tiff images in ZIP makes sense.

There are also storage "dictionary" (common blocks) gains to be made by zipping multiple compressed image files such as WebP or PNG but each onion skin takes time to peel. The end game is to unpack every single red, green and blue dot.

Using WebP in Zip is slightly larger at 10,634 bytes because of the wrapper but is still half the size of jpegxl-logo.jxl I would thus stick with that for comics, which were at source built up from only 4 colors (CMYK)

Personally I would use png in cbz (12,453 bytes) for 2 reasons a) Universally more acceptable to other comic readers b) In SumatraPDF can be easily converted to PDF for wider readership but it is then bigger at 28,435 bytes

eddiezato commented 3 years ago

For this logo here is a lossless encoding:

> magick -size 1000:-1 jpegxl-logo.svg -quality 95 jpegxl-logo.png
> cwebp -o jpegxl-logo.webp -lossless -z 9 -mt -m 6 jpegxl-logo.png
> cjxl jpegxl-logo.png jpegxl-logo.jxl -m -s 8

Name             Length
----             ------
jpegxl-logo.png   11140
jpegxl-logo.webp   5932
jpegxl-logo.jxl    5654

For this jpeg here is a lossless transcoding:

> cjxl sts-119_eva1_arnold01.jpeg sts-119_eva1_arnold01.jxl -s 8
> djxl sts-119_eva1_arnold01.jxl sts-119_eva1_arnold01_2.jpeg

> Get-FileHash sts-119_eva1_arnold01.jpeg,sts-119_eva1_arnold01_2.jpeg

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
SHA256          135FAF575E0E524C5BB366ABD449F0BDE700D65D2906A519E0D4241E7114BFC2       sts-119_eva1_arnold01.jpeg
SHA256          135FAF575E0E524C5BB366ABD449F0BDE700D65D2906A519E0D4241E7114BFC2       sts-119_eva1_arnold01_2.jpeg

Name                         Length
----                         ------
sts-119_eva1_arnold01.jpeg    99055
sts-119_eva1_arnold01_2.jpeg  99055
sts-119_eva1_arnold01.jxl     84326

In order to make the JXL a more acceptable format, we need implementation of its support in various programs. This is what this issue is about.

GitHubRulesOK commented 3 years ago

With conversion it is very much down to the routes you take for each class of source format. SVG is the highest lossless quality for illustrations / brochures, and thus well matched by PS (.PDF) For your nominated SVG vector logo (5,238 bytes) I can naturally get a fast vector PDF of (2,137 bytes) jpegxl-logo (svg).pdf

Accepting it is required as high fidelity image cbz for screen pixels I can use mutool convert -o svgout.cbz in.svg That is highest compatibility for use with SumatraPDF (based on MuPDF) but I have not specified any aggresive compression to the default so is a FAST default of 21,340 bytes, svgout.zip

Optimised for Fastest web viewing / transmission as against instant reading it is (4,496 bytes) svgout (webp).zip just rename zips back to cbz

Nice Jpeg Here it might be useful to have more aggressive compression. But at the cost of decompression issues, since other formats such as J2K raise many complaints that they are drastically slower to read. Defeating the ethos of SumatraPDF. see https://github.com/sumatrapdfreader/sumatrapdf/issues/1922 Clearly as a photo this is certainly NOT suited to WebP or PNG (it would bloat 2or3x), so plain JPG eva_arnold.pdf (99,898 bytes) in PDF pages or vanilla CBZ is the best compromise.

Interesting side note is that we can have multiple copies of a source image in a PDF for top quality 4up printing without much overhead eva_arnoldX4.pdf. Can't do that with a normal zip folder.

Bottom line is SumatraPDF would only support newer formats if It is adopted by Artifex And I see you have opened a bug report so adding a tag here https://bugs.ghostscript.com/show_bug.cgi?id=703844

GitHubRulesOK commented 3 years ago

@kjk Feedback from MuPDF is currently "RESOLVED WONTFIX" In some ways an xy problem since the core requirement for readers is a lower number of colours thus "non glossy" images can be very small when the colour gamut is reduced to all that is necessary for efficient screen reading.

I suggest that as SumatraPDF is tied heavily into MuPDF we close until it becomes a PDF supported format. Whilst a purist might decry my manipulations this reduced to (70,724 bytes bytes including pdf overhead i.e. close to JXL+overheads) PDF would be good enough for most readers. (and that's without using the most aggressive compression option, which is chargeable) sts-119_eva1_arnold01-smallpdf converted-compressed.pdf

eddiezato commented 3 years ago

sts-119_eva1_arnold01.jxl - transcoded losslessly 😉

Name                        Length
----                        ------
sts-119_eva1_arnold01.jpeg   99055
sts-119_eva1_arnold01.jxl    62171

BTW, there is information about some steps made in direction to including support for JXL in PDF. At this point we can only wait. I will close the issue.

DejayRezme commented 2 years ago

I'm looking for a comic book reader that supports jxl too.

The real star of JXL is the visually lossless encoding in my humble opinion. They use a perceptual model so that at 'butteraugli distance' of 1.0 a human cannot see the difference in a flicker test. From my experiments it really works and even at higher distance it compresses so that artifacts are more mild and spread out and less noticeable.

One particular comic book I was able to compress from 113mb to 93mb mathematical lossless (d=0) and 19.6mb visually lossless (d=1).

Of course not all examples are that good but the advantage of a "guaranteed" visually lossless compression is that you don't have to think and visually compare. And even at distance 2 I can't see much of a difference and get 12.1mb.

eddiezato commented 2 years ago

@DejayRezme Try NeeView with JXL WIC codec. Works almost perfectly for me.

eddiezato commented 2 years ago

There is also YACReader with Qt JXL plugin, but they must be built in the same environment to work properly. I haven't checked the official builds, as I built them myself.

DejayRezme commented 2 years ago

Thank you @eddiezato, NeeView looks really nice!

I also saw your suggestion to add a lossless / lossy flag for libjxl :)

GitHubRulesOK commented 2 years ago

@DejayRezme @eddiezato Unknown if your aware but SumatraPDF now has Heic plugin support (no need for more than the one portable exe) if its available by either of the common windows heic extenders and if JXL gets similar OS support that too could be used like TiFF WebP TGA etc.

The primary limitation is that SumatraPDF can use what the platform (MuPDF or Windows) provides for use, and thus although webp or heic are not usable inside a PDF they do work in a zip,cbz (or often .cbr). The most likely to next be supported is probably AV1 stills

Also note SumatraPDF now previews suitable CBX file contents here a heic in a cbz image

DejayRezme commented 2 years ago

Thanks for the clarification GitHubRulesOK. I guess once the underlying image library used gets support for JXL it would be trivial to add.

But JXL definitely still needs more platform support. Firefox and chrome only support it behind flags atm. Maybe there is a kind of "image format fatigue" with the recent additions. On windows quicklook supports it though and with the thumbnail plugin it shows nice in explorer too.

I do believe jpegXL is significantly better than the other contenders for both lossy and lossless at high fidelity. Plus this "visually lossless" mode is priceless imho. I think where it isn't as good is for very low bitrates.

Some quick and dirty test results, the mathematically lossless gains against jpg would be about 83% compression and about 54% with visually lossless (d=1):

name size Compression
comic.001.cbz 30.1 MiB
comic.001.d0.cbz 25.0 MiB 83%
comic.001.d1.cbz 17.7 MiB 58.8%
comic.001.d2.cbz 12.2 MiB 40.5%
comic.001b.cbz 42.5 MiB
comic.001b.d0.cbz 34.4 MiB 80.9%
comic.001b.d1.cbz 20.6 MiB 48.5%
comic.001b.d2.cbz 13.0 MiB 30.6%
comic.002.cbz 110.4 MiB
comic.002.d0.cbz 90.8 MiB 82.2%
comic.002.d1.cbz 19.2 MiB 17.4%
comic.002.d2.cbz 11.8 MiB 10.7%
comic.003.cbz 31.0 MiB
comic.003.d0.cbz 25.7 MiB 82.9%
comic.003.d1.cbz 17.4 MiB 56.1%
comic.003.d2.cbz 11.3 MiB 36.5%
comic.003.d3.cbz 8.5 MiB 27.4%
comic.004.cbz 39.4 MiB
comic.004.d0.cbz 33.1 MiB 84.0%
comic.004.d1.cbz 22.2 MiB 56.3%
comic.004.d2.cbz 14.5 MiB 36.8%
comic.005.cbz 50.0 MiB
comic.005.d0.cbz 41.8 MiB 83.6%
comic.005.d1.cbz 28.3 MiB 56.6%
comic.005.d2.cbz 18.4 MiB 36.8%
GitHubRulesOK commented 2 years ago

@DejayRezme whilst in the past with floppy disks, compression was king, in todays storage speed is favored, thus decompression times are king.

DejayRezme commented 2 years ago

I believe jpegXL can actually be faster on multicore than jpeg but I haven't tested this myself. And it might depend on the image library used.

But I think the CPU can decompress faster than you can read from the hard drive anyway, at least for HDD. Then the smaller file size of jxl would again be an advantage in speed.

In any case I'm looking into jxl because the size of many comic book archives is rather large for my taste.

GitHubRulesOK commented 2 years ago

@DejayRezme Case by case is unknown and file size is relative to speed, a "bigger" file in image or PDF (within reason) should be faster since the whole file usually needs to be decoded and decompressed in disk based memory , but every case can be different based on resources, hence always suggesting less pages/colours is better (if acceptable for chunking), but bigger size for same quality may also be better for speed.

canny[bot] commented 2 years ago

This issue has been linked to a Canny post: Support Jpeg XL image format :tada:

joskezelensky commented 4 weeks ago

@GitHubRulesOK i want you to take a look at a user made benchmark about file sizes, decompression times and compression times because i don't think that it differs that much (at effort 1 you get a 4 seconds decompression time difference between lossy jpeg and lossless jxl) if you'd do lossy jxl and jpeg i think it would be even faster

@DejayRezme @eddiezato Unknown if your aware but SumatraPDF now has Heic plugin support (no need for more than the one portable exe) if its available by either of the common windows heic extenders and if JXL gets similar OS support that too could be used like TiFF WebP TGA etc.

wdym by os-support? that every (windows) vanilla os supports it? or that you add support by installing third-party software because for example heic needs to be downloaded to be viewable

GitHubRulesOK commented 4 weeks ago

@joskezelensky I dont dispute JXL has benefits simply that human nature is to unnecessarily compress documents to the max (it WAS essential in previous decade) at Max Level JXL is worse than PNG Actually specifically no such thing as PNG inside a PDF so basic "flate" [de]compression, JB2 (licensing issues) is the maximal compression allowed in native JPEG compatible PDF.

For images CBZ with PNG or JPEG is native to Windows as it is simply a zip folder like DocX or ORA image containers.

WebP Copyright (c) 2010, Google Inc. All rights reserved. is sporadically supported and hinted as potential for problems so it never gained fast enough traction. (We shall see as it is a base for a possible proprietary video format! YouPayTube?)

Apple HEIC needs a Windows Plug-in and also not so well received as monetised by Microsoft.

JXL will need good uptake by Microsoft and thus wide platform support to become easy to implement. Unlike GIF/LZW JP2 TIFF/Adobe TGA etc that were beset with license issues the best thing going for JXL is its open License. still not reached version 1.0 and currently at 0.10

Feb 28 v0.10.1 Fixed = fixing a significant speed regression present since 0.9.0