sproutcore / build-tools

SproutCore Build Tools
12 stars 7 forks source link

Framework hash in the URL to enable aggressive caching #2

Closed dcporter closed 10 years ago

dcporter commented 11 years ago

Abbot currently includes an MD5 (I assume) hash of a frameworks' contents in the framework's directory structure, in order to support aggressive caching.

If a built file changes, then its hash will change, meaning that its URI will change, causing previously-cached versions to be dropped from use automatically. Since we're confident that new versions are at new URIs, we can reliably cache files "forever", improving application load performance substantially.

It's very slick, and losing this feature would be a major regression for production performance, and probably open up a ton of caching headaches that we just don't have to worry about in abbot. This is a must-have feature.

I would think that all text files in the built folder would be hashed, so that the hash would change if any JS or CSS changed. I'm less worried about images, though that may be naive.

geoffreyd commented 11 years ago

I'm not sure if this is an MD5, or SHA1. But even though we don't need any security from this, SHA is the more accepted hashing algorithm these days. So since it's just as easy to use SHA, I think that would be a good way togo for this feature.

mauritslamers commented 10 years ago

A first implementation of SHA1 content hashing is now in as computed property on a file and a combined file. This might however not exactly be what is wanted, and it raises questions on what its proper place in the architecture is, so I will leave the issue open for now.

dcporter commented 10 years ago

Nice! I'm not sure exactly at what point the hash is made, but I think if you change something as small as a image filename it will update the hash. Worth testing.

mauritslamers commented 10 years ago

The hash is made as soon as the computed content changes of a file / combined file. However, to have the hash in the directory name would require it also to be a computed property on the framework. It raises the question whether the hashing should be a mixin, supporting library functionality or just the (almost) identical computed property in three or more spots. Linked to this is also an architectural choice: up to which level should the typical building-to-disk functionality be separated from the dev-server and how can multiple strategies be included (my idea would be to use mixins).

dcporter commented 10 years ago

Seems like a great place for a mixin. Or a well-structured "ContentProcessor" superclass?

mauritslamers commented 10 years ago

After implementing the SHA1 framework hash and trying to use it, a problem of a different nature has arisen, being the combination of hashing with sc_static / static_url. If the framework hash is calculated from the actual parsed content of the framework, it will involve the urls generated by sc_static, which needs the framework hash in order to provide a proper url to depending resources inside the framework. This causes an endless loop, as the insertion of the proper url will cause a change of content, which causes a change of hash, which updates the url, which changes the content, which causes a change of hash ... etc

One solution I see is that instead of the parsed content, the hash is based on the raw content, the source. This prevents any endless loops by default, but the hash cannot be used for integrity checking. The other solution would be to base the hash on something completely different, say the number of files in the framework plus a time string. Thoughts?

topherfangio commented 10 years ago

If the approach with # of files and time string will work, then just the time string will work. Because whatever this is has to account for changes to the files themselves, not just the addition of new files.

That said, if it's just based on time, then you run into the issue that rebuilding it will always produce a different hash (because it's a different time), meaning that it would be updated even if nothing actually changed.

Is there any drawback to doing a hash of the source tree? I mean, in theory, building identical source should always produce the same output, unless we update the meaning of sc_static or static_url to do something different. Then I guess the user's source wouldn't change, but the output would...

Sorry I didn't give you a really clear answer there, but I hope the thoughts/questions are useful :-)

dcporter commented 10 years ago

Negative re: time string. The same source should result in the same hash.

I don't think we care about integrity checking, or at least not with this hash. (We can't, since as you point out this hash is used in the output code whose integrity we would be checking.) If integrity hashes want to pop up later on then wonderful, but it's a separate consideration.

mauritslamers commented 10 years ago

The framework hash is currently calculated from the raw content of the framework. If this is insufficient, please reopen.