w3c / largest-contentful-paint

Specification for the LargestContentfulPaint API
https://w3c.github.io/largest-contentful-paint/
Other
88 stars 16 forks source link

Improve unimportant image heuristic #86

Closed npm1 closed 8 months ago

npm1 commented 2 years ago

Currently we use a heuristic to exclude certain images from being considered LCP candidates. For instance, we exclude images that take the full viewport. This heuristic is imperfect as it could exclude important content or fail to exclude irrelevant content. @DanShappir gave an example where adding a content bar at the top would change whether a background image is an LCP candidate or not, so we'd love ideas on improving this heuristic.

clelland commented 1 year ago

I've been experimenting with a simple filter here - looking at the image file transfer size (in bits) relative to the layout size (in pixels) and it has been fairly effective at excluding placeholder images as well as non-contentful backgrounds (whether viewport-sized or not)

So far, even just a very low threshold (something less than 0.1bit/pixel) tends to eliminate:

Setting the bar a bit higher (say 0.2 or 0.25bpp) catches some legitimate images, although those are mostly monochrome logos or similar)

jonsneyers commented 1 year ago

A heuristic based on bpp seems like a reasonable way to catch 'low content' images, though of course there will be cases where it gets it wrong, like e.g. https://jpegxl.info/ where the LCP image is 0.001 bpp (at least on a browser that decodes image/jxl; the fallback image is 2.66 bpp so that would be fine). But such cases should be very rare and can be ignored for all practical purposes :)

It would be relatively easy to cheat such a heuristic by adding padding bytes to the image file (most image formats will just ignore trailing bytes; alternatively dummy metadata could be embedded) which could bump up the bpp without increasing the transfer size — adding 100kb of zero bytes at the end of a file can be done basically for free when using gzip or brotli transfer encoding. So the bpp would have to be computed based on the transfer size, not the file size. This also matters for svg and some uses of png (sending 'uncompressed' png with brotli transfer encoding), where the compressed transfer size is also a better indication of the 'real' entropy of the image.

One way to determine suitable thresholds would be to look at the web almanac data: https://discuss.httparchive.org/t/what-compression-ratio-ranges-are-used-for-lossy-formats/2464 The thresholds should probably be selected per image format, and e.g. "lower bpp than the 5th percentile for that format" could be a good indication that the image is likely not contentful.

Besides improving heuristics to exclude placeholders, I think it would also be useful to work on this from the other side: make it less compelling to 'cheat' the LCP timing by making LCP reflect the actual user experience better. Low-quality image placeholders are essentially a poor man's implementation of progressive rendering using non-progressive image formats, and it feels wrong that such a not-so-great solution is currently 'rewarded' (and still will be if a bpp-based threshold is added; that will just bump up the filesize/quality needed for the placeholder) while the better solution of using a progressive image format gets 'punished'. We have already discussed this in the past (https://github.com/w3c/largest-contentful-paint/issues/71) but that discussion seems to have stalled.

Philosophically, I think the point of LCP is to measure the time it takes for the largest 'contentful' element above the fold to become 'usable'. Both 'contentful' and 'usable' are hard to define, especially in an algorithmic way. For text, the reasoning was that font doesn't matter for text to be 'usable' and that text is always 'contentful'. That makes sense as a first approximiation (there are counterexamples to both, but they are probably rare enough). For images however, it's less clear how to define 'contentful' — clearly some decorative background pattern is not contentful, but where exactly to draw the line between contentful and non-contentful is not clear at all, especially algorithmically. Similarly, it is not clear how to define when an image is 'usable'. With text this is "as soon as you can read it", i.e. the text being rendered in any font makes it 'usable', and the font is just an 'aesthetic' thing that makes it look nicer but doesn't 'functionally' change anything. With images it's harder to distinguish the 'functional' aspect from the 'aesthetic' aspect, because there are many ways in which images are used and there many kinds of image contents. Images can contain text, and then 'when the text becomes legible' is likely the point where the image becomes usable. Images can contain humans, but at what point is the image usable? When the faces can be recognized? Where exactly to draw the line will depend on the context and purpose of the image.

One approach to better understand 'contentful' versus 'non-contentful' and 'usable' versus 'non-usable' is to look at it from the user behavior perspective: 'contentful' can be seen as a measure of to what extent the element is influential at all in determining user actions (in particular: page interaction/navigation), and the point where it becomes 'usable' is the point where the user goes from an idle "waiting for it to load" state to an active "absorbing information and acting on it" state.

I would argue that typically, images are already 'usable' when a low-quality preview is available, even if there are still significant 'aesthetic' differences between the preview and the final image. However, the quality of the preview has to be high enough: just a blurhash or other extremely simplified version of the image is not enough to be 'usable', and a generic placeholder or an image-dependent solid-color or gradient placeholder certainly isn't. Where exactly to draw the line between 'usable preview' and 'not usable preview' is hard, but I think as a conservative and simple approximation, we can assume that in codecs that do progressive rendering (e.g. progressive JPEG as encoded with default mozjpeg), a paint that happens after 50% of the image data is available will be a 'usable preview'.

I think that properly 'rewarding' progressive images in LCP will lead to lower usage of placeholders images, making it less likely that we end up having a 'heuristics arms race' where web devs feel compelled to improve the LCP time by using a placeholder that just barely gets accepted by the heuristic but is still useless.

clelland commented 1 year ago

One way to determine suitable thresholds would be to look at the web almanac data

I think Web Almanac is mostly based on HTTP Archive data; I ran through all of the LCP images reported there (about a year ago, so this is slightly out of date) and put the results in a spreadsheet here: https://docs.google.com/spreadsheets/d/1IGxO4J_81eGXAtd0j79YczC8Ax6gE0LM5QHItcXNvGU/edit?usp=sharing

There is a huge range in bpp, so it's most useful to look at log(bpp) instead, which is what that sheet does. In the wild, we see images ranging from almost 10^-9 bpp at the extremely low end, to more than 10^6 bpp at the high end.

I also agree about the gameability here, it's always going to be possible to set your images just past any threshold. Hopefully something like this makes it easier to tell whether someone is deliberately trying to avoid the cutoff through padding or stuffing the image.

rik commented 1 year ago

I've given this a try with --enable-features=ExcludeLowEntropyImagesFromLCP:min_bpp/0.05 in Chrome Canary 111.0.5554.0 and I'm very happy to see that tricks such as https://github.com/w3c/largest-contentful-paint/issues/72 would be defeated by this heuristic.

tunetheweb commented 1 year ago

@clelland I think this can be closed now the low entropy change reached stable? @jonsneyers comment in https://github.com/w3c/largest-contentful-paint/issues/86#issuecomment-1354455854 seems to be more in regards to #71 rather directly this issue.

There are undoubtedly other heuristics that could be improved but would suggest a more specific issue for anyone that wants to raise this, rather leaving this generic issue open for everymore.

WDYT?