Unqualified stats - Githubissues

monad0 commented 1 week ago

Under "Fast Encoding & Decoding" we're shown two dramatic graphs followed by equaling summary claims. Unfortunately, the graphics betray no specific details, nor is any process communicated which would encourage trust in the representations. Promisingly, the graphs seem to have been derived from Jon's prior work. The lossless graph shares the shape of this one in particular. Jon offers us nearly everything we'd wish to know about the methodology and the specific details of the findings, which arms us with the information to identify in what way the broad conclusions are misleading.

JPEG XL and one small slice of the Pareto front

For comparing lossless JXL to WebP, the game was heavily stacked in JXL's favor. First, cjxl utilized 8 available threads on sufficiently large images to make those threads count, while cwebp apparently used its default single threading (about WebP's settings, the article is not totally transparent, but I would expect an MT label like AVIF. cwebp only uses two threads with -mtanyway). Improvement in real time performance by adding more threads is of practical importance in contexts dealing with a single image at a time, but it doesn't generalize well to contexts where images are small, parallel resources are limited, or very many images need processed. Second, this graph represents mostly photographic or similarly complex content, were lossless WebP is relatively weaker. This at least is demonstrated by the article's following chart on non-photo content, where WebP is much more competitive despite its threading disadvantage.

Not JPEG XL and the many slices of the Internet front

Thing is, other people can make graphs, divine stats, and they're very likely to show less favorable results than JXL's best case. On a large and diverse corpus, I find this with otherwise default lossless settings: _a_cjxl-multi-0 10 1_cwebp-single-1 3 2__x_1_real Here, cjxl has access to 20x the number of threads as cwebp!

Here's the same, but plotting CPU time: _a_cjxl-multi-0 10 1_cwebp-single-1 3 2__x_1_cpu

I can make even less favorable graphs. Here's just UI and text, but with singlethreading for fairness: _a_cjxl-single-0 10 1_cwebp-single-1 3 2__t_text_ui__T_film__x_1 I hope you're not handling screenshots.

But if I measured only photographic stuff, I might conclude JPEG XL wins hands down. _a_cjxl-single-0 10 1_cwebp-single-1 3 2__t_film_photo__x_1

Corpus composition:

!?                                   colors/px (mean)
images        Mpx (mean)             |     e3/e1 (mean)
|    share    |     share    colors (mean) |     tag
807  100.00%  1.32  100.00%   87206  0.10  0.97  all
 65    8.05%  2.73   16.64%  168476  0.07  0.86  3d
 32    3.97%  0.25    0.75%   15063  0.06  0.88  albedo
 18    2.23%  1.57    2.65%  194320  0.19  1.03  album_art
 47    5.82%  1.67    7.37%   72339  0.06  1.33  algorithmic
 14    1.73%  3.63    4.77%   25988  0.01  0.96  chart
 12    1.49%  3.94    4.43%   80719  0.02  0.84  comic
 79    9.79%  1.85   13.71%  125425  0.08  0.79  digital_painting
 24    2.97%  1.73    3.90%   79792  0.10  0.84  edit
 21    2.60%  1.77    3.49%  135078  0.09  0.78  film
 10    1.24%  1.53    1.44%  126413  0.15  0.81  framed
 85   10.53%  2.23   17.83%  122613  0.05  0.89  game
 20    2.48%  1.48    2.78%   15462  0.02  1.04  game_asset
 23    2.85%  0.02    0.04%    5154  0.30  0.85  icon
 27    3.35%  1.14    2.89%  124026  0.10  0.75  jxl_art
 18    2.23%  1.71    2.88%    2846  0.00  1.93  larip
 56    6.94%  0.21    1.12%   27049  0.13  0.85  logo
 89   11.03%  1.09    9.09%   86607  0.11  0.85  lossy
 51    6.32%  1.75    8.37%  254917  0.23  0.83  machine_learning
 22    2.73%  1.21    2.50%   64714  0.09  0.98  misc
 22    2.73%  1.66    3.43%  140632  0.11  0.91  mixed
 34    4.21%  0.30    0.95%   49998  0.20  0.84  normal
 96   11.90%  0.84    7.53%  113560  0.16  0.85  photo
 21    2.60%  1.84    3.63%  251721  0.21  0.83  photo_like
 73    9.05%  0.23    1.55%     804  0.00  0.90  pixel_art
 34    4.21%  1.63    5.20%    9802  0.00  2.67  pixel_art_scaled
 11    1.36%  1.59    1.64%    5063  0.01  1.13  sprite_sheet
 24    2.97%  0.52    1.18%    3400  0.01  0.97  text
 68    8.43%  0.30    1.89%   33012  0.13  0.86  texture
 59    7.31%  1.27    7.05%   55049  0.04  0.99  ui
 60    7.43%  1.52    8.55%  123847  0.11  1.11  watermark
 18    2.23%  2.44    4.12%   93309  0.07  0.81  content_other
693   85.87%  1.38   89.99%   93789  0.09  0.98  visibly_opaque
114   14.13%  0.94   10.01%   47186  0.11  0.92  visibly_transparent
  6    0.74%  0.37    0.21%     800  0.01  1.06  DirectClass_Gray_Alpha
500   61.96%  1.47   69.19%  118676  0.12  0.90  DirectClass_sRGB
174   21.56%  1.22   19.93%   63344  0.11  1.14  DirectClass_sRGB_Alpha
 33    4.09%  1.94    6.01%     143  0.00  0.82  PseudoClass_Gray
 87   10.78%  0.54    4.44%      55  0.00  1.06  PseudoClass_sRGB
  7    0.87%  0.33    0.22%     102  0.00  1.19  PseudoClass_sRGB_Alpha
 83   10.29%  0.35    2.76%   20265  0.16  0.88  source_app->exiftool
 41    5.08%  0.27    1.02%   41699  0.17  0.82  source_gimp
 11    1.36%  1.37    1.42%   48831  0.04  0.94  source_gnome_screenshot->exiftool
534   66.17%  1.46   73.31%   89328  0.08  0.94  source_imagemagick->exiftool
 14    1.73%  0.99    1.30%   59180  0.08  1.09  source_other
 33    4.09%  1.69    5.25%  129706  0.11  0.95  source_unknown->exiftool
 31    3.84%  1.48    4.29%   52984  0.04  1.82  source_web
 60    7.43%  1.89   10.66%  199891  0.13  1.06  source_web->exiftool

jonsneyers commented 1 week ago

I agree that there are many caveats and nuances here.

Clearly there are still circumstances / use cases where lossless webp is more competitive than how it's depicted in that plot, and there are certainly images where libwebp is Pareto-better than libjxl.

For the front page high-level overview and "sales pitch", I don't think it's a good idea to specifically highlight the (current) weaknesses of (lib)jxl. I think it's better to add more in-depth subpages where things can be covered with more nuance, rigor and fairness. This will also require explaining the difference between a codec and an encoder etc.

For some specific circumstances (photographic images, encoding a single relatively large image at a time on a modern multi-core computer, lossless or high-quality lossy) — which I do think are pretty relevant circumstances for some of the main use cases of JPEG XL — these plots do paint a simplistic but roughly correct picture. For other circumstances (non-photo, single-core, very low quality, etc), the situation is indeed different and can be more favorable to webp / avif / heic.

I don't think we should hide the nuances, but I do think it makes sense on the front page to highlight mostly the strengths and the reasons for using jxl, and to keep the more complicated and nuanced analysis for subpages.

monad0 commented 1 week ago

For the front page high-level overview and "sales pitch", I don't think it's a good idea to specifically highlight the (current) weaknesses of (lib)jxl.

Yes, I'm rather speaking to being honest in making claims. This gives the claims more presence and resilience. I didn't see a good way to just fix it directly, because I didn't know where those plots came from, but now we know their meaning, we can just communicate that specifically. https://github.com/jxl-community/jxl-community.github.io/pull/44

jxl-community / jxl-community.github.io

Unqualified stats #41

JPEG XL and one small slice of the Pareto front

Not JPEG XL and the many slices of the Internet front