mdn / content

The content behind MDN Web Docs
https://developer.mozilla.org
Other
9.13k stars 22.45k forks source link

Regarding cache priority, the content expression is not rigorous enough. #26064

Closed bigbigDreamer closed 1 year ago

bigbigDreamer commented 1 year ago

MDN URL

https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching

What specific section or headline is this issue about?

https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#expires_or_max-age

What information was incorrect, unhelpful, or incomplete?

image

What did you expect to see?

ETag takes precedence.

I think whether this is a priority or not depends on the implementation mechanism of the server. Assuming I implement a static resource server using Node, I can completely set Last-Modified as a priority.

Do you have any supporting links, references, or citations?

No response

Do you have anything more you want to share?

No response

MDN metadata

Page report details * Folder: `en-us/web/http/caching` * MDN URL: https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching * GitHub URL: https://github.com/mdn/content/blob/main/files/en-us/web/http/caching/index.md * Last commit: https://github.com/mdn/content/commit/eb9eef29f1ccdaf1c8a464dbe4483c78f7a13b2a * Document last modified: 2023-03-03T04:45:34.000Z
bigbigDreamer commented 1 year ago

In fact, I have read HTTP2616, strong entity caching, according to the specification recommendations, it is true that 'Etag' has a higher priority, but I think the description here should be improved: the specific implementation rules still depend on how the cache server handles them.

hamishwillee commented 1 year ago

Thanks @bigbigDreamer. MDN documents the spec, which is how the cache server is supposed to implement things. By that measure the current text is correct.

Further, if you look at the note, it is only peripherally about priority. That's just a mechanism to explain that it is worth including Last-Modified even though it won't be used during during cache revalidation:

So considering the overall HTTP ecosystem, it is preferable to provide both ETag and Last-Modified.

Upshot, this seems to be correct to me.

@teoli2003 - got any thoughts on this one? Perhaps I'm missing something.

bigbigDreamer commented 1 year ago

if both ETag and Last-Modified are present, ETag takes precedence.

I have my own opinions on the expression of this sentence. While reviewing the specifications, I also took a look at the implementation of express, so I think: The specific priorities depend on how the server implements them, and the specification is only a guidance in terms of document meaning.

image

@hamishwillee

hamishwillee commented 1 year ago

The way I read the relevant part of the spec, a server should sent both the entity tag/last modified tags unless it is not feasible to generate one - and a strong entity tag and a Last-Modified value are preferred. A client must use an entity tag if provided and should use both entity tag/last modified if provided.

If the server sends both, which is the only case where you can discuss preference/priority, how is it up to the server to decide the priority of these headers? It is up to the consumer - either a client or a caching server. To me that means the implementation of the origin server, such as express, is not relevant - only the consumer, and only in the case where both headers have been set.

But even aside from that, what does it matter/how will it change the point of the note? Which is that there is a preference for both headers to be sent.

I'm not an expert, which is why I'll defer to @teoli2003 !

hamishwillee commented 1 year ago

@teoli2003 If I'm missing the point, and I may be, it would be easy to remove the discussion of priority, because it really is irrelevant to the note:

Note: It is preferrable for origin servers to provide both ETag and Last-Modified headers. While only one header is required for cache validation, and ETag is preferred in the specification, Last-Modified is used for other purposes such as: content-management (CMS) systems to display the last-modified time, crawlers to adjust crawl frequency, and so on.

teoli2003 commented 1 year ago

My understanding of the latest spec (and RFC9111 for the cache itself) is that:

bigbigDreamer commented 1 year ago

My understanding of the latest spec (and RFC9111 for the cache itself) is that:

  • servers should send both headers
  • clients control the caches and decide the priority. So it sounds like a good plan.

I think this explanation should be mutual. When the response responds to both headers at the same time, the client should also send If-None-Matched or If-Modify-Since at the same time. As for negotiating this process, it should be sent on the server side (including cache servers), because they decide whether to send a 304 status code to tell the client whether to continue reusing cached resources or reload resources.

  • clients control the caches and decide the priority.

I don't quite understand which specific type of object the client you are referring to here represents.

The freshness check that I understand, which is the 304 behavior, always occurs on the server (including cache servers). I have doubts or questions about the priority of judging between Etag and If-None-Matched headers versus Last_Modify and If-Modify-Since headers.


Thank you for your replies despite being busy. If there are any mistakes or omissions in my understanding, I hope to receive criticism and correction.

hamishwillee commented 1 year ago

The only message of relevance to most readers is "send both headers if you can". That is the only point of the note, and why I suggest reducing it.

That said, I think the note is probably misleading when it says "ETag takes precedence if both headers are sent.". The spec seems to indicate that if both are provided both should be used, which implies that if either fails the stale resource won't be used. There is no indication of a priority - i.e. a case where last-modified is stale and ETag is valid, so you still use the stale resource:

If both an entity tag and a Last-Modified value have been provided by the origin server, SHOULD use both validators in cache-conditional requests. This allows both HTTP/1.0 and HTTP/1.1 caches to respond appropriately.

Again that's why I think talking about priorities is pointless. But perhaps we do need to make this clear (if it is correct)

Let's ask the big guns ...

@Jxck This is related to the note in https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#expires_or_max-age

Jxck commented 1 year ago

In Latest HTTP spec (Please make sure to discuss based on latest spec RFC 9xxx)

In 200 responses to GET or HEAD, an origin server SHOULD send any available validator fields (Section 8.8) for the selected representation, with both a strong entity tag and a Last-Modified date being preferred. --- https://www.rfc-editor.org/rfc/rfc9110#section-15.3.1-5

But spec also saids that

A recipient MUST ignore If-Modified-Since if the request contains an If-None-Match header field; the condition in If-None-Match is considered to be a more accurate replacement for the condition in If-Modified-Since, and the two are only combined for the sake of interoperating with older intermediaries that might not implement If-None-Match. --- https://www.rfc-editor.org/rfc/rfc9110#section-13.1.3-5

So if you only focus on "Caching", it seems that there are no benefit for Server to send both ETag & Last-Modified since Last-Modified will be ignored when Client sends back it with If-Modified-Since. But it's not, since Last-Modified is not only for caching but also other use-cases which requires "when this resources modified last ?".

For Server implementation, there are tons of implementation and that (server implementations spec compatibility) is out of scope of MDN. MDN should based on Spec (and sometimes mentioning about browser implementation).

And finally, I think the Note can be fixup like below

Note: When evaluating how to use ETag and Last-Modified, consider the following: During cache revalidation, if both If-Modified-Since and If-None-Match are present, If-None-Match takes precedence for validator. Therefore, if you are only considering caching, you may think that Last-Modified is unnecessary. However, Last-Modified is not just useful for caching; instead, it is a standard HTTP header that is also used by content-management (CMS) systems to display the last-modified time, by crawlers to adjust crawl frequency, and for other various purposes. So considering the overall HTTP ecosystem, it is preferable to provide both ETag and Last-Modified. Therefore, RFC9110 prefers that Server should send both ETag & Last-Modified for 200 response if possible.

bigbigDreamer commented 1 year ago

Thank you for the pertinent repair. I fully agree the content source of MDN (based on Spec (and sometimes mentioning about browser implementation).), and changing ETag and Last-Modified to If-Modified-Since and If-None-Match seems more reasonable, at least it reduces some confusion in terms of behavior description.

Sorry, I was too focused on the accuracy of my answer regarding the question that I forgot about the source of MDN's content. Nonetheless, I am grateful for this discussion and have gained a lot from it. @hamishwillee @teoli2003

hamishwillee commented 1 year ago

Thanks very much @Jxck I have updated the note in https://github.com/mdn/content/pull/26108/ - This turns the note around a bit, because generally it is better in docs to provide instructions then reasons - in this case the instruction is "best to send both tags". Hope I got it right.

@bigbigDreamer @teoli2003 Thanks for this. I've learned a bit too :-).

bigbigDreamer commented 4 months ago

https://github.com/jshttp/fresh/pull/38

In the process of discussing retrospectives, a "fresh" PR (Pull Request) is also included as one of the changes.