magnars / optimus

A Ring middleware for frontend performance optimization.
364 stars 23 forks source link

Cache-busting url replacements ignore :base-url #26

Open favila opened 10 years ago

favila commented 10 years ago

My ultimate goal is to get the cache-busting part of the path to not be the root, because that arrangement makes it difficult to do path filtering on web servers and such. (E.g., you can't rely on all static assets being in a common directory, e.g., /static, after cache-busting.)

I was trying to work around this issue using :base-url, but add-cache-busted-expires-header doesn't work like link/full-path and ignores the :base-url.

This issue is closely connected with #9, but more specific because it only concerns assets rewritten after cache-busting optimizations.

I'm thinking any of these solutions might be better in add-cache-busted-expires-header:

  1. Include :base-url as in link/full-path, i.e. "<base-url>/<cache-buster><original-path>"
  2. Allow injection of a function to determine the shape of the cache-busted url. Inputs to the function are asset and sha1-hash, output is new path.
magnars commented 10 years ago

Thanks for bringing this issue to my attention. I think the second option gives us the most flexibility. Just for my own understanding; what exactly do you mean by path filtering? Would you be generating entirely different paths based on eg. the type of asset? I take it this means relative paths would not solve the issue?

favila commented 10 years ago

By path filtering, I mean it's common to have a setup like this:

  1. If path starts with /static (or host is static.example.org, etc), have the webserver attempt to serve a static file directly from some tree of assets.
  2. Otherwise have some dynamic service handle the request (e.g. a servlet).

I can't think of a scenario offhand where relative paths would not solve this issue, but they're harder to reason about and an absolute-to-relative-path rewriter would be trickier to implement.

The general problem is that the cache-busting urls produce unknowable paths on the root making it difficult to isolate concerns by path segment. After cache busting I need to be aware of every possible root path segment that may be served by anything static or dynamic to ensure that there are no conflicts, vs just making a policy for dynamic handlers "only the /static/* path is reserved."

There are other approximations of this behavior but they are not as bulletproof. Workarounds:

  1. Attempt to serve all paths statically, then fall back to dynamic service on 404.
  2. If webserver can do regex matches (not google app engine), then serve files for paths like ^/[0-9a-f]{12}/static/, and route to dynamic webapp otherwise. (Although the webapp could potentially want to handle paths with the same pattern, so this is a bad solution.)
magnars commented 10 years ago

How about rewriting the cache busters so they don't mess with the path, adding instead a cache buster just prior to the file name? That wouldn't be a breaking change, and sounds to me like it would fix your issue?

favila commented 10 years ago

Yes that would work perfectly for me and I think it is a more sensible default anyway. (It's what I would have done if I could have injected a function to rewrite the path.)

However even if only the filename is changed you still need to account for :base-url in the paths written into the css, as people could be using it for other reasons. (e.g., :base-url "http://my.cdn/v3").

It might still be a breaking change to any deployment configurations or scripts which assume the current path structure. (I have a hacky regex which would break, but I it only exists because of this issue--I would be able to get rid of it.)

magnars commented 10 years ago

The pieces of the puzzle that I would like to have implemented now are:

Another transformation (also optional) would be rewriting absolute asset links to complete URLs (including base-url). Since neither of us have any need for that yet, let's not spend any time on it.

Do you want to give any of this a shot?

favila commented 10 years ago

Sure, I think I understand enough to do path-preserving cache busting very easily. Expect a pull-request in a day or two. This patch will be extremely modest: it will just outright change where the hash is spliced into the full path, and nothing else. (No configurable callback, relative paths, etc.)

The absolute-to-relative transformation I'm not sure about. I suppose the best approach is, given an asset, create a relative path for each :references asset. Relative path is from asset :original-path to reference-asset :original-path. This may be better as a separate optimization. Thoughts? I can work on this, too, but I am a bit unsure of the correct design/behavior.

magnars commented 10 years ago

I'll give the relative-path transformation some thought. Your plan for a modest path-preserving cache buster sounds excellent. :-)