Currently our apps send aggressive "don't cache anything ever" headers:
$ curl -v 'https://wellcomecollection.org' 2>&1 | grep cache
< cache-control: private, no-cache, no-store, max-age=0, must-revalidate
< x-cache: Miss from cloudfront
This header means that every request is being sent to our apps, even when it's similar to another request that's come in recently. This has several drawbacks:
It's slower for users (because every request goes back to our app rather than being served from an edge node which is closer to them)
It's more expensive for us (because the apps have to process more requests)
It makes the site more fragile (because a spike in requests can knock the site over, even if the requests are all the same)
We saw this today when the Wellcome Collection "What's On" newsletter went out. It landed in everyone's inboxes, and something like 500 email servers proceeded to scan every link in the newsletter, which knocked the site over. Because these requests were distributed over many different IP addresses, our WAF rules didn't protect us. But they scanned a relatively small number of different URLs – a caching period would have probably protected us from this spike.
This is a topic that's been discussed on multiple occasions previously:
Currently our apps send aggressive "don't cache anything ever" headers:
This header means that every request is being sent to our apps, even when it's similar to another request that's come in recently. This has several drawbacks:
We saw this today when the Wellcome Collection "What's On" newsletter went out. It landed in everyone's inboxes, and something like 500 email servers proceeded to scan every link in the newsletter, which knocked the site over. Because these requests were distributed over many different IP addresses, our WAF rules didn't protect us. But they scanned a relatively small number of different URLs – a caching period would have probably protected us from this spike.
This is a topic that's been discussed on multiple occasions previously: