mitodl / ocw-hugo-themes

A Hugo theme for building OCW websites
BSD 3-Clause "New" or "Revised" License
5 stars 4 forks source link

Set Cache-Control headers #1212

Closed ChristopherChudzicki closed 12 months ago

ChristopherChudzicki commented 1 year ago

Description/Context

Many of our static assets are currently missing a Cache-control header, so requests revalidate with the server and wait for a 304 Not Modified response before using cached resources. Usually the cached assets have not changed and could be used without revalidation. This includes course_v2.<hash>.css, which is hashed and is the only non-cached, render-blocking resource for course sites.

Screenshot 2023-07-20 at 11 02 41 AM Screenshot 2023-07-20 at 11 08 18 AM

Plan

We should set relevant cache-control headers. It seems to me that there are two cases:

  1. static resources with hashed filenames: These could use Cache-Control: max-age <very large>,
  2. assets that don't have a hashed filename

For assets that don't have a hashed filename (e.g., ocw_logo_white_v2.svg) we could either set a reasonable max-age like 1 week, or we could start hashing them (either with Webpack, or possibly with Hugo's fingerprint. Or something else.

pdpinch commented 1 year ago

Presumably, adding a cache-control header would have to be done in Fastly. Via the VCL?

ChristopherChudzicki commented 1 year ago

Presumably, adding a cache-control header would have to be done in Fastly. Via the VCL?

Based on the Fastly docs, I believe that's correct.

ibrahimjaved12 commented 1 year ago

Setting cache-control headers for static hashed resources is simple enough; in src/ol_infrastructure/applications/ocw_site/snippets/ttl_setup.vcl we'd just need to add the header with a pattern matching the resource names.

Possible solutions for non-hashed static assets:

Let's first look at how we are using these assets, eg. in a partial, <img src="/static_shared/images/mit-ol.png" /> And we're using them like this say around 35 times.

Using Webpack

We would have something like [contenthash] or [fullhash] here in webpack.common.ts: new CopyWebpackPlugin({ patterns: [ { from: "./base-theme/assets/static_shared/images/", to: "images/[name][contenthash][ext]" } ] }),

The problem we would face is, we are directly calling our assets with their names in the partials. If we do this, we would not know the name of our transpiled asset anymore. We'd have to loop through the assets, or search over them, maybe create a mapping of sorts. Would need to look more over possibilities of this.

Using Hugo's FingerPrint

This works directly, like this: <img src="{{ (resources.Getstatic_shared/images/LinkedIn.png| resources.Fingerprintmd5).Permalink }}" /> the resources.Get function will only look under directory assets/something and so we'd have a problem if we use: "assets/images" because then in server we'd have the hashed URL: localhost:3000/images/logo_img[hash][ext] and we don't want that. We want it for localhost:3000/static_shared/images/logo_img[hash][ext]

A solution to this is, moving our assets directory, say images, from/assets/images/to/assets/static_shared/images/ and making the appropriate change inside webpack.common.ts. Then our web server will have the correct URL for hashed assets.

Using either of the above approaches allow us to use a large max-age (eg. a month) for cache-control headers.

Not hashing assets

If we do not hash those assets, then we should use a small max-age (eg. a week) as mentioned in the ticket. This is to ensure that if we have stale content, we only have it for maximum of a week.

Implementing Approach

I'm currently implementing Hugo's Fingerprint solution. Making changes in all the files where we're using those assets to include a Fingerprint.

ibrahimjaved12 commented 1 year ago

@ChristopherChudzicki I wanted to confirm. Using the Hugo's Fingerprint makes the ticket wide-scoped. As mentioned above, this would require to modify every line of code where /static_shared/images/ or /static_shared/fonts/ is used in order to include the hash. The change itself is simple -- but it's in over 35 places. It's the testing that would require more time -- to ensure every change is working as expected.

However, I do also believe the same we would have to do in case if we go with webpack changes. We'd have to change every line of code where they are used. And the testing would be same.

@pdpinch if we go with this approach, implementing would take only few hours, but testing could take around 2 days (more or less depending on the nature of changes involved).

So I want to confirm should we go with this or just rely alone on setting max-age to 1 week (to unhashed assets).

ChristopherChudzicki commented 1 year ago

@ibrahimjaved12 In my opinion, if you change how we load static assets to something like

////////// Original
<image src="/static_shared/path/to/file.png" >

////////// Changed to
<image src="{{ partial "hashed_file.html" "/static_shared/path/to/file.png"  >

//////////  with partial hashed_file.html:
{{ $file := resources.Get "/static_shared/path/to/file.png" }}
{{ $hashed := $file | resources.Fingerprint "sha512" }}
{{ return $hashed }}

Notes:

Because the build will fail if a file is missing, you should, IMO, be very confident in the change after some initial smoke testing. I'm not sure it's worth checking each and every asset individually (though that might not be so hard.).

PS: Hugo's error messages suck. If /path/to/file doesn't exist, the error is something like Cannot call fingerprint on resource nil. It might be worth adding an explicit if resource is nil, panic("useful error message"). 🤷

ibrahimjaved12 commented 1 year ago

@ibrahimjaved12 In my opinion, if you change how we load static assets to something like

////////// Original
<image src="/static_shared/path/to/file.png" >

////////// Changed to
<image src="{{ partial "hashed_file.html" "/static_shared/path/to/file.png"  >

//////////  with partial hashed_file.html:
{{ $file := resources.Get "/static_shared/path/to/file.png" }}
{{ $hashed := $file | resources.Fingerprint "sha512" }}
{{ return $hashed }}

Notes:

  • Original is error prone: In the original case, if /path/to/file.png was missing or had a typo, etc, we would only catch the error by visual inspection. 😦😦
  • New way is better: In the new approach... We get the hashes we need, good. But we also get free error checking. The build will fail if the file is missing. Loud error. Yay. 😁

Because the build will fail if a file is missing, you should, IMO, be very confident in the change after some initial smoke testing. I'm not sure it's worth checking each and every asset individually (though that might not be so hard.).

PS: Hugo's error messages suck. If /path/to/file doesn't exist, the error is something like Cannot call fingerprint on resource nil. It might be worth adding an explicit if resource is nil, panic("useful error message"). 🤷

That's actually a clever way to check for errors. So now we would be relying on Hugo to report error -- otherwise it would be up to img tag and we'd just have a broken image visually. (or broken font)

pdpinch commented 1 year ago

Is the plan to turn every static file reference into a template? How many are there? Does it make sense to focus on large files, or is it all or nothing?

ChristopherChudzicki commented 1 year ago

Is the plan to turn every static file reference into a template?

No. My suggestion was for a reusable "hashed_file" partial:

<image src="{{ partial "hashed_file.html" "/static_shared/first_logo.png"  >
<!-- rendered as:
<image src="/static_shared/first_logo.png?HASH_1"  >
 -->

<image src="{{ partial "hashed_file.html" "/static_shared/another_file.png"  >
<!-- rendered as:
<image src="/static_shared/another_file.png?HASH_2"  >
 -->

<image src="{{ partial "hashed_file.html" "/static_shared/one_more.png"  >
<!-- rendered as:
<image src="/static_shared/one_more.png?HASH_3"  >
 -->

How many are there?

I'm not sure how many static file references we have— @ibrahimjaved12 said more than 35.

Does it make sense to focus on large files, or is it all or nothing?

I don't think filesize matters. It's all or nothing in the sense that anything that doesn't get hashed won't update until max-age expires.

ibrahimjaved12 commented 12 months ago

This ticket has been completed with the PRs: https://github.com/mitodl/ol-infrastructure/pull/1857 https://github.com/mitodl/ocw-hugo-themes/pull/1272 https://github.com/mitodl/ocw-studio/pull/2017