iftechfoundation / ifarchive-unbox

IF Archive Unboxing service
https://unbox.ifarchive.org
MIT License
1 stars 2 forks source link

Unboxer has an unbustable cache, which can take weeks to clear out #64

Open dfabulich opened 1 month ago

dfabulich commented 1 month ago

Unboxer has multiple layers of caching.

When the IF Archive team updates a zip, they update it in place. In the worst case, this means:

This would be a very unlucky outcome; but it's still pretty much guaranteed that any updates to a zip file won't be reflected on the unboxer for seven days, if only thanks to nginx caching.

EDIT: Updated since the nginx cache is only for subdomains, so it uses either the nginx or CloudFlare caches, but not both.

curiousdannii commented 1 month ago

The nginx cache is only for subdomains, so it uses either the nginx or CloudFlare caches, but not both.

dfabulich commented 1 month ago

Thanks; I updated my description with the full story.

Here are some options for what to do.

  1. We could set HTML to Cache-Control: max-age=0, or at least, a very small number, ensuring that the user's browser will try to contact us for the latest HTML. That's not quite as bad as it seems, because the browser will do a conditional GET request, so nginx will likely just return a 304 Not Modified response.
  2. Alternately, it is possible to purge nginx's cache on an URL-by-URL basis.

    The way you do that is to add a secret proxy_cache_bypass header. https://serverfault.com/questions/493411/how-to-delete-single-nginx-cache-file

    proxy_cache_bypass $http_x_b2ca678b4c936f905fb82f2733f5297f;
    curl -s -o /dev/null  -H "X-b2ca678b4c936f905fb82f2733f5297f: 1" "https://23nwbwjk2e.unbox.ifarchive.org/23nwbwjk2e/www/index.html"

    Since the nginx configuration is currently being generated by nginx.sh, we can keep the key secret in the production options.json.

    When it's time to purge nginx's cache, we can fetch those URLs with the bypass header.

    But, this is much more complicated than just setting max-age=0, and, if we continue to use max-age=604800, the user's browser will still try to cache files for 7 days.

  3. As for Cloudflare, since the subresources are the target of a 301 redirect, we could add a cache key as an URL parameter.

    For example, currently we redirect from https://23nwbwjk2e.unbox.ifarchive.org/23nwbwjk2e/www/audio/bgm/Theme1.ogg to https://unbox.ifarchive.org/23nwbwjk2e/www/audio/bgm/Theme1.ogg, with no Cache-Control header on the redirect.

    We could instead do a 302 redirect, Cache-Control: max-age=0, and make the destination be https://unbox.ifarchive.org/23nwbwjk2e/www/audio/bgm/Theme1.ogg?hash=98487658efd7b63f3e3cf237522bcae7, Cache-Control: max-age=31536000 (1 year). If Theme1.ogg ever changes, we'll use a different URL for it, ensuring that the user gets fresh content.

  4. Alternately, Cloudflare has an API to purge files from its cache. https://developers.cloudflare.com/api/operations/zone-purge

    It requires a secret key, and I think it might be complicated to compute exactly which URLs need to be purged, so I like this idea slightly less.