h5bp / server-configs-apache

Apache HTTP server boilerplate configs
MIT License
3.19k stars 641 forks source link

`Cache-Control: immutable` #148

Closed Malvoz closed 1 year ago

Malvoz commented 6 years ago

The cache-control header (which takes precedence over expires if present) has been asked about before in #85 and #73.

I would like to raise this again because the header provides finer control than expires. Also with the addition of the immutable directive (see blog posts 1, 2, 3), we get a performance benefit but also no longer have to set long max-age directives for infinite caching.

LeoColomb commented 6 years ago

Thanks for your suggestion, @Malvoz!

I would like to raise this again because the header provides finer control than expires.

Cache-Control is already used by Apache with the ExpireXX directives.

Also with the addition of the immutable directive

Generally I'm in favor to follow new standards, but here I'm concerned by the potential downsides of the immutable directive:

Malvoz commented 6 years ago

Real infinite caching without revalidation must be used with care: Should not be used without SSL/TLS level. Must be used with really-definitive or really-well-managed files, the user must know what it means. This is not so trivial.

Yes good catch, it could be commented out with notes on TLS/SSL. The web is moving towards an "HTTPS first" web and there are other HTTP header fields that indeed require HTTPS. I would be surprised if H5BP does not move to an HTTPS-first approach in the future with HTTP configurations commented out instead.

immutable is relatively new and support for it does not cover all major browsers yet. But the fact that cache-control rolls out new directives, IMO speaks in favor for it.

Are there equivalent approaches of expires to all cache-control's directives?

LeoColomb commented 6 years ago

Are there equivalent approaches of expires to all cache-control's directives?

No, but Expires header is added by Apache automatically for backward compatibility only.

creopard commented 6 years ago

Here's also a nice read about Expires header vs Cache-Control and why Expires header is deprecated... https://www.fastly.com/blog/headers-we-dont-want

LeoColomb commented 6 years ago

Just to be clear here: Cache-Control is already the preference in the config. Expires is added by Apache, not explicitly by the config.

creopard commented 6 years ago

I guess I was confused by "Expires" and the "ExpiresActive" setting...

Malvoz commented 6 years ago

@LeoColomb

Generally I'm in favor to follow new standards, but here I'm concerned by the potential downsides of the immutable directive:

... we need to use Header, which comes with a room for bad configurations.

The only benefit I see using mod_expires is that you can ExpiresByType <media type> which seems impossible using cache-control? Instead you need to FilesMatch every potential file which may be error prone. Is that what you are referring to?

... Must be used with really-definitive or really-well-managed files, the user must know what it means. This is not so trivial.

So unless I'm aware of that fact (I realize there is a note on this), this is already an issue with:

https://github.com/h5bp/server-configs-apache/blob/9481d537d60d0226667a2f9712018a1ff4d799d8/dist/.htaccess#L1047

https://github.com/h5bp/server-configs-apache/blob/9481d537d60d0226667a2f9712018a1ff4d799d8/dist/.htaccess#L1077-L1079

Malvoz commented 6 years ago

Self quote:

The only benefit I see using mod_expires is that you can ExpiresByType <media type> which seems impossible using cache-control? Instead you need to FilesMatch every potential file which may be error prone.

Maybe you could do something like: Header set Cache-Control "<VALUE>" "expr=%{CONTENT_TYPE} =~ m#<MEDIA TYPE>|<MEDIA TYPE>#"

Malvoz commented 5 years ago

Now that this issue is about immutable - I've been looking into filename-based_cache_busting.conf and there are things I suggest to adress:

https://github.com/h5bp/server-configs-apache/blob/5dc823c18e4a0ee163c2ee3b772060bce7d782e6/src/web_performance/filename-based_cache_busting.conf#L9-L11

In 2008 Steve Souders wrote about Squid not caching resources with query string parameters. But it's been around 10 years since Squid changed that behavior: http://www.squid-cache.org/Versions/v2/2.7/RELEASENOTES.html#s1

The default rules to not cache dynamic content from cgi-bin and query URLs have been altered. Previously, the "cache" ACL was used to mark requests as non-cachable - this is enforced even on dynamic content which returns cachability information. This has changed in Squid-2.7 to use the default refresh pattern. Dynamic content is now cached if it is marked as cachable [...]

Malvoz commented 5 years ago

Friendly bump :)

The immutable directive is really beneficial in terms of performance. More info on that here:

And it's backwards compatible, browsers that don't understand it just ignores it and uses max-age instead.

Perhaps we can set an environment variable at:

https://github.com/h5bp/server-configs-apache/blob/5dc823c18e4a0ee163c2ee3b772060bce7d782e6/src/web_performance/filename-based_cache_busting.conf#L16

and respond to request within that environment with:

<IfModule mod_headers.c>
  Header merge Cache-Control "immutable, max-age=31536000"
</IfModule>

Now, I'm not comfortable with apache env variables so if you agree with this, you can PR or help me set it up :)

LeoColomb commented 5 years ago

Thanks @Malvoz. I'm ready to go. Thoughts @XhmikosR?

LeoColomb commented 5 years ago

OK, we can start thinking of an implementation.

Webhint suggests the following:

    # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    # Where needed add `immutable` value to the `Cache-Control` header

    <IfModule mod_headers.c>

        # Because `mod_headers` cannot match based on the content-type,
        # the following workaround needs to be done.

        # 1) Add the `immutable` value to the `Cache-Control` header
        #    to all resources.

        Header merge Cache-Control immutable

        # 2) Remove the value for all resources that shouldn't be have it.

        <FilesMatch "\.(appcache|cur|geojson|ico|json(ld)?|x?html?|topojson|xml)$">
            Header edit Cache-Control immutable ""
        </FilesMatch>

    </IfModule>

As we already did with other conditional headers, we may use MIME type expressions instead. (and as you suggested)

Header merge Cache-Control "immutable" "expr=%{CONTENT_TYPE} =~ m#<MEDIA TYPE>|<MEDIA TYPE>#"

Perhaps we can set an environment variable at

I don't feel confortable adding environment variables. Hard to understand when they are evaluated, hard to debug.

  • This advice [cache-busting with hash in filenames] is quite outdated

This is a different issue, but you are right. That said it can be hard to have a strong configuration on proxies or CDN when using query string. To be honest I don't have any precise opinion on this, except that webpack still use the hash-in-name template by default, if I'm correct.

Malvoz commented 5 years ago

Webhint suggests the following:

As an aside, I've already opened an issue at webhint about Apaches ability to match based on content-type. The example also seems to have syntax errors, and they should use a long max-age as fallback too, I can take these things up with them.


In the following example, I'm matching against every file that is not text/html and has v= in a query string.

Header set Cache-Control "max-age=31536000, immutable" "expr=%{QUERY_STRING} =~ m#v\=#i && %{CONTENT_TYPE} !~ m#text/html#i"

This would match e.g. /app.css?v=1.0.0.

To meet your want/requirement of having file-name based matching, can we then just apply some regex for %{REQUEST_FILENAME} instead of %{QUERY_STRING} to the example above?

Malvoz commented 5 years ago

This advice [cache-busting with hash in filenames] is quite outdated

This is a different issue, but you are right. That said it can be hard to have a strong configuration on proxies or CDN when using query string.

I'm yet to find any up-to-date sources to verify proxies/CDNs having issues with query strings in the modern web (again, Squid introduced caching of query strings as a default in 2008~). But perhaps I haven't searched hard enough. ^^

LeoColomb commented 5 years ago

In the following example

Let's start with MIME-type only first. We'll see cache busting later.

And I think we should prefer merging over setting Cache-Control header to add the immutable attribute.


But perhaps I haven't searched hard enough.

Lack of feature or correctness is never documented. 😆

Malvoz commented 5 years ago

I think we should prefer merging over setting Cache-Control header to add the immutable attribute.

I overlooked that in the example. However I don't think merge is good enough either, in section 2.1, RFC 8246:

[...] proxies SHOULD skip conditionally revalidating fresh responses containing the immutable extension unless there is a signal from the client that a validation is necessary (e.g., a no-cache Cache-Control request directive defined in Section 5.2.1.4 of [RFC7234]).

Although I don't know why a developer would, but in any case a developer uses no-cache or perhaps no-store with versioned files then immutable (and max-age) would be ignored.

Malvoz commented 4 years ago

Revisiting this; reusing the same MIME-types as used in filename-based_cache_busting.conf (except for .webmanifest, since it shouldn't be versioned) to match the same cache-busting pattern:

<IfModule mod_headers.c>
  Header set Cache-Control "max-age=31536000, immutable" "expr=%{REQUEST_URI} =~ m#^(.+)\.(\w+)\.(bmp|css|cur|gif|ico|jpe?g|m?js|a?png|svgz?|webp)$#i"
</IfModule>

/cc @LeoColomb

Malvoz commented 4 years ago

A self-reminder to look into this more, while the example above would make sure that other directives (such as no-cache and no-store) are overridden for versioned files per the regex - which is necessary to preserve the behavior of long max-age and immutable (as described in https://github.com/h5bp/server-configs-apache/issues/148#issuecomment-519946513), this would also override no-transform, it shouldn't...

Q: do transcoding intermediaries (proxies and others) only require Cache-Control to be sent for the document (text/html)? If so then this is not an issue, as immutable shouldn't be specified for HTML resources (and the proposed regex doesn't look for HTML).

Not sure if answer lies somewhere in
https://www.w3.org/TR/ct-landscape/
https://www.w3.org/TR/ct-guidelines/

https://support.google.com/webmasters/answer/6211428?hl=en says (emphasize mine):

Opting out of Web Light If you do not want your pages to be transcoded, set the HTTP header "Cache-Control: no-transform" in your page response. If Googlebot sees this header, your page will not be transcoded.


Edit: I guess this could be solved by proper ordering in .htaccess, setting the Header merge of Cache-Control: no-transform after immutable... @LeoColomb is ordering of config snippets bad to rely on? Does H5BP do that already?

LeoColomb commented 4 years ago

Does H5BP do that already?

In a way to get things working yes, but the perfect order is mostly impossible. Anyway, we can review the order if it helps.