whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.05k stars 2.64k forks source link

Preventing downloading images or objects until they are visible in the viewport #2806

Closed JoshTumath closed 4 years ago

JoshTumath commented 7 years ago

See PR #3752

Problem

Many websites are very image heavy, but not all of those images are going to be viewed by visitors. Especially on mobile devices where most visitors do not scroll down very much; it is mostly the content at the top of the page that is consumed. Most of the images further down the page will never be viewed, but they are downloaded anyway.

This is slowing down the overall page load time, unnecessarily increasing mobile data charges for some visitors and increasing the amount of data held in memory.

Example workaround

For years, the BBC News team have been using the following method to work around this problem. Primary images at the top of the page are included in the HTML document in the typical way using an img element. However, any other images are loaded in lazily with a script. For those images, they are inidially included in the HTL document as a div which acts as a placeholder. The div is styled with CSS to have the same dimensions as the loaded image and has a grey background with a BBC logo on it.

<div class="js-delayed-image-load"
     data-src="https://ichef.bbci.co.uk/news/304/cpsprodpb/26B1/production/_96750990_totenhosen_alamy976y.jpg"
     data-width="976" data-height="549"
     data-alt="Campino of the Toten Hosen"></div>

Eventually, a script will replace it with an img element when it is visible in the viewport.

Doing this with a script is not ideal, because:

  1. If the visitor has scripts disabled, or the script fails to load, the images won't ever appear
  2. We don't know in advance the size of the visitor's viewport, so we have to arbitrarily determine which images to load in lazily. On a news article, vistors on small viewports will only initially see the News logo and an article's hero image, but larger viewports will initially be able to see many other images (e.g. in a sidebar). But we have to favour the lowest common denominator for the sake of mobile devices. This gives users with a large viewport a strange experience where the placeholders appear for a second when they load the page.
  3. We have to wait for the script to asyncronously download and execute before any placeholders can be replaced with images.

Solution

There needs to be a native method for authors to do this without using a script.

One solution to this is to have an attribute for declaring which images or objects should not be downloaded and decoded until they are visible in the viewport. For example, <img lazyload>.*

Alternatively, a meta element could be placed in the head to globally set all images and objects to only download once they are visible in the viewport.

* An attribute with that name was proposed in the Resource Priorities spec a few years ago, but it didn't prevent the image from downloading - it just gave a hint to the browser about the ordering, which is probably not as useful in a HTTP/2 world.

domenic commented 7 years ago

Hmm, this was previously discussed at https://www.w3.org/Bugs/Public/show_bug.cgi?id=17842, but GitHub is more friendly for people. Let me merge that thread into here, but please please please read all the contents of the discussion there, as this is very well-trod ground and we don't want to have to reiterate the same discussions over again.

wildlyinaccurate commented 7 years ago

I've just spent an hour reading the thread on the original bug report (which @JoshTumath actually reported). There was initially confusion between two features: (1) Being able to tag images as "not important" so that the browser can give priority to other resources. (2) Being able to opt in to loading specific images only at the point where they are in the viewport or just about to enter it. This issue is specifically for (2). I will refer to this as "lazy loading".

The thread goes around in circles and doesn't really have a clear outcome, although the implementations discussed still seem valid and relevant today (Jake's summary in comment 49 is a good point to start at if you don't want to read the entire thread). I'm going to try not to repeat too much from that thread, but it has been 5 years now and as far as I can see lazy loading images is still a relatively common pattern. On top of that, the profile of the average internet-connected device has changed drastically (under-powered Android devices on very expensive cellular connections) and in my opinion the argument for lazy loading images is stronger now than it was 5 years ago.

I'm going to provide some insight into a use case that I'm very familiar with: the BBC News front page. I'll do this in the hopes that it provides some real life context around why I think lazy loading images is important, and why doing it in JS is not good for users.

Loading the page in Firefox in a 360 x 640 viewport from the UK (important because the UK does not get ads, which skews the results), the browser makes the following requests:

We use lazysizes to lazy load all but the very first article image. Lazysizes makes up about half of our JS bundle size. I know it's overkill for our use case but it's a popular and well-tested library. We load our JS with a <script async> tag, so it can take some time before the JS is executed and the images are inserted into the document. The experience of seeing so many image placeholders for several seconds can be quite awkward. We actually used defer for a while but the delay was deemed too long on slower devices.

From our point of view the benefits of the UA providing lazy loading are:

Despite Ilya's arguments against lazy loading in general, we've been doing it for 5 years and we're going to continue doing it until cellular data is much cheaper. If we got rid of our lazy loading, two thirds of our mobile users would download 170kB of data that they never use. Keeping the next billion in mind, that's about 3 minutes of minimum wage work. At our scale (up to 50M unique mobile visitors to the site each week) 170kB per page load starts to feel uncomfortably expensive for our users.

So what do the WHATWG folk think? Is it worth having this conversation again? Is there still vendor interest? Mozilla expressed interest 5 years ago but it seems like nothing really happened.

jakearchibald commented 7 years ago

We literally halve the amount of JS in our bundle

Intersection observers means the JS for triggering loading on element visibility is tiny.

The UA can load images earlier, probably as early as DOMContentLoaded.

That's also possible with a small amount of JS.

The UA can decide whether to lazy load at all (e.g. only lazy load on cellular connections).

Yeah I think browser heuristics (along with no JS dependency) are the remaining selling points of something like lazyload. But is it enough to justify it?

wildlyinaccurate commented 7 years ago

Intersection observers means the JS for triggering loading on element visibility is tiny.

Yeah, fair call. If we drop our big ol' lazy loading JS for a lazyload attribute we may as well drop it for 10 lines of intersection observer wiring.

I guess the thing that appeals to me most about a lazyload attribute is that it's pretty much the minimum amount of friction you could have for implementing lazy loading, and it leaves all of the nuance up to the UA. In my experience developers don't really know about or care about the nuance of whether their JS is blocking or deferred; runs at DOMCL or load. If there was a big slider that controlled who did the most work UA o-----------|--o Devs I would shift it all the way to UA because devs often don't have the time to do things in a way that provides the best experience for users. I realise this kind of thinking goes against the Extensible Web Manifesto, though. 🙊

Zirro commented 7 years ago

I can see two more arguments in favour of an attribute. The first is that lazy loading mechanisms which depend on scripts have a significant impact for user agents where scripts don't execute. To prevent images from loading early, the images are only inserted into the DOM later on, leaving non-scripting environments without images at all. Few sites seem to think about the <noscript> element these days.

The second is that providing it through an attribute means that the user can configure the behaviour as they prefer to experience the web. Someone on a slow connection might want to make images start loading earlier than when the image enters the viewport in order to finish loading in time, while someone else with a lot of bandwidth who dislikes lazy loading can disable it entirely.

(In general, I believe it is important that common website practices are standardised in order to give some control of the experience back to the user, or we may eventually find ourselves with a web that is more of a closed runtime than a document platform which is open to changes by extensions, user scripts and userstyles.)

jakearchibald commented 7 years ago

@Zirro those arguments are the "browser heuristics" and "no JS dependency" benefits I already mentioned, no?

Zirro commented 7 years ago

@jakearchibald I suppose I understood the "no JS dependency" benefit as referring only to having to load less JavaScript rather than the content being available to non-scripting agents, and missed the meaning of "browser heuristics" in your sentence. Still, I hope that detailing the arguments and why they are important can help convince those who are not yet sure about why this would be useful.

domenic commented 7 years ago

In general non-scripting agents are not a very compelling argument to get browsers to support a proposal, given that they all support scripting :). (And I believe these days you can't turn off scripting in any of them without extensions.)

Zirro commented 7 years ago

@domenic I would hope that they see the value in having a Web that is accessible to all kinds of agents beyond their own implementations, much like a person without a disability can see the value of designing a website with accessibility in mind.

JoshTumath commented 7 years ago

In general non-scripting agents are not a very compelling argument to get browsers to support a proposal, given that they all support scripting :).

@domenic The issue is more whether these scripts fail to download, which does lead to an odd experience. It's becoming harder and harder these days to progressively enhance websites as we seem to depend on scripting more and more for the core functionality of our websites.

Yeah I think browser heuristics (along with no JS dependency) are the remaining selling points of something like lazyload. But is it enough to justify it?

I think both of these are big selling points for the reasons above. As I say, this is not something that's possible to progressively enhance. There is not any way to provide a fallback for those for whom the JS fails for whatever reason.

A few years ago, GDS calculated how many visits do not receive 'JavaScript enhancements', which was a staggering 1.1%. Like GDS, at the BBC, we have to cater to a very wide audience and not all of them will have stable internet connections. I have a good connection at home and even for me the lazyloading script can fail to kick in sometimes.

Additionally, I feel as though we haven't covered one of the main issues with this that I mentioned in my original comment:

We don't know in advance the size of the visitor's viewport, so we have to arbitrarily determine which images to load in lazily.

Because we're using a script, we've had to use placeholder divs for most images. While this is great for mobile devices whose viewports are too small to see many images at once, this is really unhelpful on large viewports. It creates an odd experience and means we can't benefit from having the browser start downloading the images as normal before DOMContentLoaded is triggered. Only a browser solution can know in advance the viewport size and determine which images to download immediately and which ones to only download once scrolled into view.

hartman commented 7 years ago

@domenic The issue is more whether these scripts fail to download, which does lead to an odd experience. It's becoming harder and harder these days to progressively enhance websites as we seem to depend on scripting more and more for the core functionality of our websites.

I completely agree with this. At Wikipedia/Wikimedia, we have seen that interrupted JS downloads in low quality bandwidth situations are one of the most common causes of various problems. And that's also exactly the user situation where you'd want lazy loaded images. I'd guess with service workers you could do lazy loaded images as well, and then at least you're likely to have them on your second successful navigation, but yeah:

It's becoming harder and harder these days to progressively enhance websites as we seem to depend on scripting more and more for the core functionality of our websites.

Only a browser solution can know in advance the viewport size and determine which images to download immediately and which ones to only download once scrolled into view.

addyosmani commented 7 years ago

A topic I would like to tease apart is whether lazy-loading of images alone is the most compelling use-case to focus on vs. a solution that allows attribute-based priority specification for any type of resource (e.g <iframe lazyload> or <video priority="low"> ).

I know <img lazyload> addresses a very specific need, but I can imagine developers wanting to similarly apply lazyload to other types of resources. I'm unsure how much granular control may be desirable however. Would there be value in focusing on the fetch prioritization use-case?

JoshTumath commented 7 years ago

It would definitely be useful to have this for iframes, objects and embeds as well!

As for video and audio, correct me if I'm wrong, but unless the preload or autoplay attributes are used, the media resource won't be downloaded anyway until it's initiated by the user. However, if they are specified, it might be useful to be able to use lazyload so they don't start buffering until they are scrolled into view.

When you mention a more general priority specification, do you mean something like the old Resource Priorities spec? What kind of behaviour are you thinking of?

smfr commented 6 years ago

I believe Edge already does lazy image loading. For out-of-viewport images, it loads enough of the image to get metadata for size (with byte-range requests?), but not the entire image. The entire image is then loaded when visible to the user.

Would lots of small byte-range requests for out-of-viewport images be acceptable?

annevk commented 6 years ago

@mcmanus I think the previous comment in this thread is of interest to you.

shallawa commented 6 years ago

I have two questions:

I can't think of any use of these cases:

async="on" and lazyload="off" async="off" and lazyload="on"

If any of them is "on", the browser will be lazy loading or decoding the image. In any case, the user won't see the image drawn immediately. So should not a single attribute be used to indicate the laziness for loading and the decoding the image?

<img src="image.png" lazy> and .box { background-image: url("backgorund.gif") lazy; }

JoshTumath commented 6 years ago

Will the css background image use a similar attribute? .box { background-image: url("backgorund.gif") lazyload; }

I guess that would be a separate discussion in the CSS WG, but at least in the case of BBC websites, the few background images that are used are visible at the top of the page, and therefore need to be loaded immediately anyway.

If any of them is "on", the browser will be lazy loading or decoding the image. In any case, the user won't see the image drawn immediately. So should not a single attribute be used to indicate the laziness for loading and the decoding the image?

It also depends on if these attributes would prevent the image from being downloaded entirely, or whether it would just affect the order in which the images are downloaded. (I think the latter would be much less useful.)

Malvoz commented 6 years ago

The content performance policy draft suggests <img> lazy loading, although they only mention lazy loading of images and no other embeds it seems that their idea is to enable developers to opt-in for site-wide lazy loading.

othermaciej commented 6 years ago

For a complete proposal, we probably need not just a way to mark an image as lazy loading but also a way to provide a placeholder. Sometimes colors are used as placeholders but often it's a more data-compact form of the image itself (a blurry view of the main image colors seems popular). Placeholders are also sometimes used for non-lazy images, e.g. on Medium the immediately-visible splash images on articles briefly show a fuzzy placeholder.

Also: Apple is interested in a feature along these lines.

laukstein commented 6 years ago

@othermaciej in early 2014 I proposed CSS placeholder (similar to background property only applied until loaded/failed) https://lists.w3.org/Archives/Public/www-style/2014Jan/0046.html and still there haven't been any progress related to it.

domenic commented 6 years ago

Should we bring back lowsrc=""? https://books.google.com/books?id=QwFV3pXy4TkC&pg=PA56&lpg=PA56&dq=html4+lowsrc&source=bl&ots=y7z5CkGNvy&sig=Wj5TiqVdXPRGFW4Vmct4Vf5DA04&hl=en&sa=X&ved=0ahUKEwjx7bHu0aPaAhUpx1kKHbdjAToQ6AEIMzAB#v=onepage&q=html4%20lowsrc&f=false

bengreenstein commented 6 years ago

The Chrome team's proposal is a lazyload=”” attribute. It applies to images and iframes for now, although in the future we might expand it to other resources like videos.

“lazyload” has the following states:

In Chrome we plan to always respect on and off. (Perhaps we should make them always-respected in the spec too, instead of being strong hints? Thoughts welcome.)

Deferring images and iframes delays their respective load events until they are loaded. However, a deferred image or iframe will not delay the document/window's load event.

One possible strategy for lazyload="on", which allows lazily loading images without affecting layout, is to issue a range request for the beginning of an image file and to construct an appropriately sized placeholder using the dimensions found in the image header. This is the approach Chrome will take. Edge might already do something similar.

We’re also open to the idea of supporting developer-provided placeholder images, though ideally, lazyloaded images would always be fully loaded before the user scrolls them into the viewport. Note that such placeholders might already be accomplishable today with a CSS background-image that is a data URL, but we can investigate in parallel with lazyload="" a dedicated feature like lowsrc="" or integration into or similar.

Although we won’t go into the details here (unless you’re interested), we also would like to add a feature policy to flip the default for lazyload="" from auto to off. This would be used for example for a particularly important