Despite what intuition may say, the newest format is not always the best. Some formats have advantages over each other in certain situations.
In my testing, comparing JPEG 2000 vs Jpegli-encoded JPEGs, the quality and file size comparison varies heavily, despite being the same resolution at the same settings. While usually JP2 is smaller and higher quality, one test image was an unexplainable 500 KB bigger, at the same settings which usually saved at least a few KB. There is no easy way to predict or account for this. And for something with more scientific data behind it than me messing around: Although JPEG XL is rather consistently better than AVIF at photographic content, AVIF seems to consistently rank better than JXL in non-photographic contexts.[^1]
This can result in situations where automatic generation overrides itself with a worse image. Ideally, a format would only be used if it prevails over an older one. Though, this absolutely needs to be specified per-format. We wouldn't want it to skip a WebP for a JP2, nothing would be able to use it! It'd also be desirable to have leniency on this; if the difference is within a few KB, it might be worth keeping for higher quality, or to take advantage of JXL progressive decoding.
Knowing this information, here's my proposal for how this could work. Let's imagine a setting skip_if_worse, which could be defined like this:
Here, JXL is set to skip if it's 50 KB larger than any format, WebP is set to skip if larger than JPEG by any amount, and JP2 is set to skip if larger by any amount. I felt true is more intuitive, but you could set it to 0 instead (for 0 KB).
Now with these settings, maybe we generate a JXL of 420 KB, an AVIF of 400 KB, a WebP of 800 KB, a JP2 of 601 KB, and a JPEG of 600 KB. The JXL is 20 KB bigger than the AVIF. However, we've deemed progressive decoding to be worth the end result in this case, so we keep the JXL. The WebP is larger than the JP2, but we do not care. What we do care about is being bigger than the JPEG, so we discard the WebP. Finally, we see the JP2 is 1 KB larger than the JPEG, so we discard it as well. The end result is only using [jxl, avif, jpg], as WebP or JP2 would be detrimental to the user here. At least from a pure bandwidth standpoint, but I imagine trying to factor in a quality metric would be too complex and out of scope, especially since the configurable limit alleviates this. You could also imagine a scenario where we get a JXL of 460 KB and an AVIF of 400 KB, and in this case, we'd only use [avif, jpg].
Despite what intuition may say, the newest format is not always the best. Some formats have advantages over each other in certain situations.
In my testing, comparing JPEG 2000 vs Jpegli-encoded JPEGs, the quality and file size comparison varies heavily, despite being the same resolution at the same settings. While usually JP2 is smaller and higher quality, one test image was an unexplainable 500 KB bigger, at the same settings which usually saved at least a few KB. There is no easy way to predict or account for this. And for something with more scientific data behind it than me messing around: Although JPEG XL is rather consistently better than AVIF at photographic content, AVIF seems to consistently rank better than JXL in non-photographic contexts.[^1]
This can result in situations where automatic generation overrides itself with a worse image. Ideally, a format would only be used if it prevails over an older one. Though, this absolutely needs to be specified per-format. We wouldn't want it to skip a WebP for a JP2, nothing would be able to use it! It'd also be desirable to have leniency on this; if the difference is within a few KB, it might be worth keeping for higher quality, or to take advantage of JXL progressive decoding.
Knowing this information, here's my proposal for how this could work. Let's imagine a setting
skip_if_worse
, which could be defined like this:Here, JXL is set to skip if it's 50 KB larger than any format, WebP is set to skip if larger than JPEG by any amount, and JP2 is set to skip if larger by any amount. I felt
true
is more intuitive, but you could set it to 0 instead (for 0 KB).Now with these settings, maybe we generate a JXL of 420 KB, an AVIF of 400 KB, a WebP of 800 KB, a JP2 of 601 KB, and a JPEG of 600 KB. The JXL is 20 KB bigger than the AVIF. However, we've deemed progressive decoding to be worth the end result in this case, so we keep the JXL. The WebP is larger than the JP2, but we do not care. What we do care about is being bigger than the JPEG, so we discard the WebP. Finally, we see the JP2 is 1 KB larger than the JPEG, so we discard it as well. The end result is only using
[jxl, avif, jpg]
, as WebP or JP2 would be detrimental to the user here. At least from a pure bandwidth standpoint, but I imagine trying to factor in a quality metric would be too complex and out of scope, especially since the configurable limit alleviates this. You could also imagine a scenario where we get a JXL of 460 KB and an AVIF of 400 KB, and in this case, we'd only use[avif, jpg]
.[^1]: Page 17 (Encoder Results > Results by Image Category) of the Cloudinary Image Dataset ’22 paper: https://cloudinary-marketing-res.cloudinary.com/image/upload/v1682016636/wg1m99012-ICQ-AIC3_Contribution_Cloudinary_CID22.pdf -- Or alternatively, this interactive website visualizing the results of the paper: https://cloudinary.com/labs/cid22/plots