google / pik

A new lossy/lossless image format for photos and the internet
MIT License
831 stars 51 forks source link

PIK not exceptional for high resolution photographic sources -> use existing solutions? #36

Open wolfbeast opened 6 years ago

wolfbeast commented 6 years ago

Spun off from #33 to separate different points out as requested.

While WebP compresses well at low bit rates and doesn't exhibit as bad block artefacts as JPEG, WebP becomes increasingly more inefficient at higher bit rates (when one tries to compress without visual loss), and at high quality (somewhere above quality 80-90 in libjpeg) it becomes less efficient in compression density than JPEG as implemented in libjpeg. Also, some photographers have discussed the need to manually review WebP re-compression while discussing that an approach based on guetzli/butteraugli could possibly be fully automated. PIK is the guetzli/butteraugli approach refined to the best possible state.

Once again the choice of format based on required bitrate is important. If focused on high resolution photographic sources, then the advances in compression from Guetzli can already be used in standard JPEG images (or one could consider JXR which is already (ISO) standardized if things like wide-gamut and transparency are desired on top of photographic-focused compression techniques). Since the comparison showcase focuses solely on high bitrate/high quality compression, what more does Butteraugli bring to the table there that makes the new image format needed, without a doubt? Since Butteraugli only aids in finding the optimal psychovisual distance, why can't all of this just be made in a slow-but-accurate JPEG/JXR compressor instead of making an entirely new format? Especially with JXR offering broader dynamic color ranges, and being designed from the ground up for HD photographic material, a highly-optimized compressor for that format would very likely result in at least competitive, if not superior images than what PIK could produce.

khavishbhundoo commented 6 years ago

Since the comparison showcase focuses solely on high bitrate/high quality compression, what more does Butteraugli bring to the table there that makes the new image format needed, without a doubt? Since Butteraugli only aids in finding the optimal psychovisual distance, why can't all of this just be made in a slow-but-accurate JPEG/JXR compressor instead of making an entirely new format?

Pik have the highest compression ratio + minimal compression artefacts(better than libjpeg). Here is a spreadsheet showing the psychovisual score at similar filesizes. https://docs.google.com/spreadsheets/d/1qHIdENHNYqKwuEqMR-_i39Jfz1TvxSEtsC2lJB-wnHI/edit?usp=sharing Visual comparison https://i.gyazo.com/cc09bd127ef098f96696a77477f69396.png

If you check my study at encode.ru you will notice that pik performs better than libjpeg both according to butteraugli and ssimulacra. The closer the codec is to the origin , the better it is.

https://encode.ru/threads/2814-Psychovisual-analysis-on-modern-lossy-image-codecs?p=55041&viewfull=1#post55041

The problem with JPEG is that it internally uses PSNR metric which doesn't correlate well with human vision unlike butteraugli.

Butteraugli is a project that estimates the psychovisual similarity of two images. It gives a score for the images that is reliable in the domain of barely noticeable differences. Butteraugli not only gives a scalar score, but also computes a spatial map of the level of differences.

One of the main motivations for this project is the statistical differences in location and density of different color receptors, particularly the low density of blue cones in the fovea. Another motivation comes from more accurate modeling of ganglion cells, particularly the frequency space inhibition.

wolfbeast commented 6 years ago

I think you missed my point. JPEG as an image format doesn't use PSNR. Encoders might, but not the image format itself -- as such, I suggested using a better encoder for existing formats, that can be decoded with industry-standard decoders. Of course a slow compressor is going to look better than a reference libjpeg encoder @ 77 quality because it's really not a very good encoder ;)

I don't think a higher compression ratio that is in the range of 2.5-2.7% warrants a new image format (especially not compared to Guetzli compression which has a very low psychovisual deviation as well). The problem I have with the encode.ru thread is that is it not using a solid scientific basis. It sets out to specifically compare very defined examples to "prove the theory right", instead of taking a neutral approach or trying to disprove it. For one, you are using the same technique (Butteraugli) in your compression as you are using in your reference comparison measurements; that will undoubtedly favor pik. Secondly, using a "best scenario" reference and then forcing other encoders to use similar file sizes (wholly ignoring the fact that you can't make a statement about compression ratios in that case because that is entirely under your control! The settings were clearly chosen to give similar-but-larger file sizes...) is stacking the odds in pik's favor as well. I don't think this was a fair comparison and it feels very heavily biased.

As such, I don't yet see a reason for a new, considerably slower-to-use, image format when a smart, accurate JPEG encoder could achieve the same or insignificantly-different results. The research so far is far from conclusive, seems to be very biased and subjective, and should be verified independently. Please do not make any decisions based on what has been shown so far.

I volunteer to provide this independent verification.

I'll be more than happy to put some time into making my own comparisons from varying source images with minimal bias, if I can be given the tools (windows binaries) to compress/convert and compare/get PV scores. For this, kindly supply:

  1. a pik compressor (preferably png -> pik)
  2. a pik decompressor (preferably pik -> png)
  3. a Guetzli jpeg encoder/decoder
  4. a Butteraugli scoring tool
  5. a Ssimulacra scoring tool

With those tools, I can provide verification of results using real-world images, with minimal bias, using various comparison techniques and sets.

khavishbhundoo commented 6 years ago

I suggested using a better encoder for existing formats, that can be decoded with industry-standard decoders Guetzli is a better JPEG encoder

I don't think a higher compression ratio that is in the range of 2.5-2.7% warrants a new image format (especially not compared to Guetzli compression which has a very low psychovisual deviation as well)

Higher compression ratio with much higher visual quality sure does .Pik isn't just an JPEG encoder with butteraugli as a perceptual metric but is a whole new state of the art image encoder in its own right.

I have with the encode.ru thread is that is it not using a solid scientific basis. It sets out to specifically compare very defined examples to "prove the theory right", instead of taking a neutral approach or trying to disprove it. For one, you are using the same technique (Butteraugli) in your compression as you are using in your reference comparison measurements; that will undoubtedly favor pik.

The thread contain just a few examples where i took images from Jyrki Alakuijala's image corpus + Xiph Test Media corpus(high quality 8 bpp+ source images). Butteraugli favoring pik is not a surprise since its optimized for it.Similarly encoders that are optimized with ssimulacra or any other psychovisual metric will perform the best of it.The important point here is which one of those two tools correlate the best with human vision.

The research so far is far from conclusive, seems to be very biased and subjective, and should be verified independently. Please do not make any decisions based on what has been shown so far.

.I never claim the results to be conclusive.I don't have the resources to do tests on 10k+ random high quality source images to reach to any conclusions

Secondly, using a "best scenario" reference and then forcing other encoders to use similar file sizes (wholly ignoring the fact that you can't make a statement about compression ratios in that case because that is entirely under your control! The settings were clearly chosen to give similar-but-larger file sizes...) is stacking the odds in pik's favor as well. I don't think this was a fair comparison and it feels very heavily biased.

Pik 1.0 is the one with the highest visual quality with smallest filesize so it made sense to use it as a reference.Then i wanted to see how other encoders for other format performs for similar filesize( equal to pik size or take the next one.I contacted the authors of the different encoders to get optimal settings(favoring quality over filesize) for each encoder.Ofcourse you can hand tune your encoders , use your own quantization tables tailored for each image but that not practical.

The Compression Rate(%) column gives the compression ratio w.r.t the source image and Reference Compression Rate(%) is the compression ratio w.r.t a libjpeg @ quality 93.This is the quality that most camera use by default

I'll be more than happy to put some time into making my own comparisons from varying source images with minimal bias,

I don't have windows binaries for the encoders because i perform it on a centos server.You can look on github for the source and then you compile for windows.I have a bash script available,

https://github.com/khavishbhundoo/Psychovisual-analysis-on-modern-lossy-image-codecs/blob/master/compression_test.sh

jyrkialakuijala commented 6 years ago

About efficiency:

Moonchild/Wolfbeast, could you provide an example "high resolution photographic source" image where we don't get exceptional savings with pik? After that we can try compressing the image with a variety of methods to a level where we can barely see some artefacts -- and look at the file sizes. On our reference set we are indeed seeing exceptional performance. (When certain kinds of encoding/colorspace losses are already applied to a high resolution photographic source, we see less savings.)

About colorspace:

Pik's color space internally (roughly) matches to that of non-subsampled LMS, which means that every color that can be perceived by a human can be stored and reproduced with Pik. There are some inefficiencies in converting existing lossy images into the LMS-derived color spaces, but those are in the range of 10 % losses (well compensated by the technique's other gains) and will go away once more psychovisually compatible colorspaces are used for capturing images.