whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.02k stars 2.62k forks source link

How does a list of available images used when parsing document with multiple img nodes with same src? #2465

Open LeonidVasilyev opened 7 years ago

LeonidVasilyev commented 7 years ago

Standard states following regarding list of available images:

It is not used to avoid re-downloading the same image while the previous image is still loading.

However, if you open an HTML document that contains multiple img nodes with same src in current versions of Chrome, Firefox or IE 11 you will notice that these browsers make only single network request for that image.

I checked this test case in serveral browsers:

Test case Chrome 56 Firefox 50.0.1 Firefox 31 IE 11
HTML document with multiple sequential img nodes with same src URI Single request Single request Single request Single request

Is this a deviation from standard or a side effect from some behavior specified by HTML standard?

jdm commented 7 years ago

Those browsers are presumably relying on the HTTP cache which contains an in-progress response.

LeonidVasilyev commented 7 years ago

Interestingly, this ain't gonna happen if you simultaneously send multiple GET requests to same URI using XMLHttpRequest. Besides, in my experience, requests that got cached response end up appearing on Network tab in development tools anyway.

LeonidVasilyev commented 7 years ago

I add Cache-Control: no-store, no-cache to responses. Browser still performs only single request.

domenic commented 7 years ago

This is specified: https://html.spec.whatwg.org/#the-list-of-available-images

Closing but happy to continue discussing in the closed thread, and reopen if we missed something in the spec.

LeonidVasilyev commented 7 years ago

@domenic isn't that correct that according to standard parsing of following piece of HTML should end up performing two network requests? Given browser sees URI of the src for the first time.

<img src="foo/bar.png" />
<img src="foo/bar.png" />
annevk commented 7 years ago

@LeonidVasilyev it's not correct, for images in particular, due to the map @domenic referenced.

LeonidVasilyev commented 7 years ago

@annevk my reasoning is based on two pieces of HTML standard. 14 step in 4.8.4.3.4 Updating the image data states that image is added to list of available images after it is fetched:

Furthermore, the last task that is queued by the networking task source once the resource has been fetched must additionally run these steps: ...

  1. Set image request to the completely available state.
  2. Add the image to the list of available images using the key key, with the ignore higher-layer caching flag set.

First note from 4.8.4.3.3 The list of available images states that list of available images is not used to avoid re-downloading image while it's still loading:

It is not used to avoid re-downloading the same image while the previous image is still loading.

In my example when parser sees second img tag list of available images doesn't contan first image because it is still downloading (in general case). If there is no image for second img tag in list of available images browser should perform second request.

Please correct me if I wrong or missed something.

annevk commented 7 years ago

That's probably an error of sorts, or maybe the difference between Chrome/WebKit's memory cache and this HTML feature.

domenic commented 7 years ago

I think what happens here is that Chrome decides to not make a second image request to that URL while the first one is in progress. That isn't governed by the spec I guess, and might technically be against spec depending on how you read things.

Then when it comes time to make the second request, it goes through the logic to check the list of available images, and the spec takes over.

I believe @surma was doing some research on this?

surma commented 7 years ago

I was doing some research on behavior discrepancies between browsers when it comes to fetch() requests, Worker instantiation and iframes. Images have a somewhat special handling, but I assume similar patterns apply:

If you request resource A, Chrome does indeed block a 2nd request for resource A until the first request is resolved and re-uses the response if the caching headers in the first response allow it. If the headers turn out to disallow reuse, a second request is dispatched to the network.

In the context of fetch, setting {cache: 'no-store'} should make the 2nd request go to the network immediately and not wait for the first request to return, but as of now Chrome doesn’t support the cache option at all.

This behavior differs wildly across browsers – but none of them violate the HTTP spec for caching, they are just suboptimal at times.

Not sure this is necessarily helpful for this discussion – but I see stalling a second request to wait for the first one not as a violation of the HTML spec.

zcorpan commented 7 years ago

Isn't this behavior the same for other things, like fonts, stylesheets, scripts?

The intent as far as the spec for img goes is that the logic for reusing an ongoing fetch is the responsibility of the Fetch spec, and the "list of available images" is layered on top and only populated for completed fetches with decodable images. (There is an open bug about extending it to cover unsuccessful image fetches as well, to avoid retrying over and over.)

annevk commented 7 years ago

Isn't this behavior the same for other things, like fonts, stylesheets, scripts?

Only WebKit/Chrome have this so-called "memory cache" as I understand it. I've asked some folks to describe it and get it standardized, but not much activity thus far.

LeonidVasilyev commented 7 years ago

Couple more test cases:

Test case Chrome Firefox IE 11
HTML document with multiple sequential stylesheet link nodes with same href URI Single request Single request Single request
HTML document with multiple sequential script nodes with same src URI Single request One request for each node Single request
HTML document with multiple sequential async script nodes with same src URI Single request One request for each node Single request

Altough Firefox has a flag named browser.cache.memory.enable it doesn't seem to affect browser behavior in described scenarios.

surma commented 7 years ago

@LeonidVasilyev: What are the caching headers on those resources? According to my research, they do have an impact on how the browser behaves:

screenshot 2017-03-26 15 01 32

serialize = last request has to finish before next request is kicked off parallelize = all requests are sent out at the same time and have a unique response wait 1st + reuse = the response for the first request is reused for all remaining requests

LeonidVasilyev commented 7 years ago

@surma, I've got same result for Chrome, Firefox and IE 11 with both Cache-Control: no-cache and Cache-Control: max-age=3600 response headers

surma commented 7 years ago

Wow, that’s giving me a headache. I’ll take a closer look at this in the coming weeks. Whatever the underlying mechanism, this is not good in terms of developer ergonomics.

LeonidVasilyev commented 7 years ago

Refresh image with a new one at the same url discussion on StackOverflow contains interesting information about in-memory cache or list of available images behavior. According to Aya's answer you should use both Cache-Control: no-store response HTTP header and cache buster based on random URI fragment in order to prevent image requests from hitting cache.

processprocess commented 3 years ago

I'm testing this in Chrome using a local express server. I'm noticing some interesting behavior.

I have 1000 img elements in an html doc.

Weird that a delay on the res.download from the server ensures 1 request. Realistically a network request wouldn't resolve as fast as a local server and the timeout could simulate a typical network request, so this might be good news for caching. Still I wonder what is going on here.