simonw / datasette

An open source multi-tool for exploring and publishing data
https://datasette.io
Apache License 2.0
9.39k stars 668 forks source link

Sensible `cache-control` headers for static assets, including those served by plugins #1645

Open curiousleo opened 2 years ago

curiousleo commented 2 years ago

What I'm seeing

With default_cache_ttl = 86400, I see the following:

A table view returns Cache-control: max-age=86400:

Screenshot_20220228_190000

A static asset returns no Cache-control header:

Screenshot_20220228_185933

What I expected to see

I expected the static asset to return a Cache-control header indicating that this response can be cached.

Why this matters

I'm productionising a Datasette deployment right now and was looking into putting it behind a Varnish instance. I was surprised to see requests for static assets being served from Datasette rather than Varnish, this is what led me to look more closely at the response headers.

While Datasette serves those static assets pretty quickly, I don't see why Datasette should serve them. By their nature, static assets like images and JS files are very cacheable, so it should be easy to serve them from a cache like Varnish.

(Note that Varnish can easily be configured to override this header, enabling caching for static assets. But it would be better if this override was not necessary.)

Discussion

It seems clear to me that serving static assets without a Cache-control header is not ideal.

I see two options here:

A. Static assets use the same logic as table / SQL views to set the Cache-control header based on default_cache_ttl. B. An additional setting for static assets is introduced (default_static_cache_ttl, say).

simonw commented 2 years ago

I agree: this is bad.

Ideally, content served from /static/ would apply best practices for static content serving - which to my mind means the following:

Datasette half-implemented the first of these: if you view source on https://latest.datasette.io/ you'll see it links to /-/static/app.css?cead5a - which in the template looks like this:

https://github.com/simonw/datasette/blob/dd94157f8958bdfe9f45575add934ccf1aba6d63/datasette/templates/base.html#L5

I had forgotten I had implemented this! Here is how it is calculated:

https://github.com/simonw/datasette/blob/458f03ad3a454d271f47a643f4530bd8b60ddb76/datasette/app.py#L510-L516

So app.css right now could be safely served with a far-future cache header... only it isn't:

~ % curl -i 'https://latest.datasette.io/-/static/app.css?cead5a' 
HTTP/2 200 
content-type: text/css
x-databases: _memory, _internal, fixtures, extra_database
x-cloud-trace-context: 9ddc825620eb53d30fc127d1c750f342
date: Sat, 05 Mar 2022 01:01:53 GMT
server: Google Frontend
content-length: 16178

The larger question though is what to do about other assets. I'm particularly interested in plugin assets, since visualization plugins like datasette-vega and datasette-cluster-map ship with large amounts of JavaScript and I'd really like that to be sensibly cached by default.

simonw commented 2 years ago

The existing app_css_hash already isn't good enough, because I built that before table.js existed, and that file should obviously be smartly cached too.

simonw commented 2 years ago

It sounds like you can workaround this with Varnish configuration for the moment, but I'm going to bump this up the list of things to fix - it's particularly relevant now as I'd like to get a solution in place before Datasette 1.0, since it's likely to be beneficial to plugins and hence should be part of the stable, documented plugin interface.

simonw commented 2 years ago

Hah, found a TODO about this: https://github.com/simonw/datasette/blob/c5791156d92615f25696ba93dae5bb2dcc192c98/datasette/app.py#L997-L999