michaelficarra / commonjs-everywhere

:rainbow: minimal CommonJS browser bundler with aliasing, extensibility, and source maps
BSD 3-Clause "New" or "Revised" License
158 stars 21 forks source link

Improve build cache #83

Closed tarruda closed 10 years ago

tarruda commented 10 years ago

Hi, you've done some great work in this project, if I knew about it before I wouldn't have written grunt-coffee-build

I've taken a quick look at the code and would like to add a few improvements if its ok with you:

What do you think? I've used these techniques in the grunt-coffee-build task and it provides near-instant builds even for big projects. Another advantage is that the cli can be reused in grunt-driven workflows(eg: grunt-contrib-watch)

michaelficarra commented 10 years ago

you've done some great work in this project

Thanks! I'm very happy with it.

Use the modified timestamp of the file instead of the md5 of its contents

An md5 determines if the file has changed, a timestamp only tells us it has been touched.

Cache the processed file contents in options.cache (eg: compile-to-js intermediate results)

Isn't that what we do already?

Serialize 'options.cache' to disk so the build cache will be reused even across cli restarts(use process exit events to flush the cache back to disk)

I think that's a great idea. I believe it's been requested before, too. The only problem would be serialising/deserialising regexps if you want to represent the AST as JSON.

near-instant builds

That's surprising, since escodegen is such a large portion of the build time. See #63 for more discussion about that.

tarruda commented 10 years ago

An md5 determines if the file has changed, a timestamp only tells us it has been touched.

You're right, but its much more expensive than simply checking file metadata(not to count the small extra cost taken to copy the data from the filesystem cache to the v8 process) . Cryptographic hashes like md5 are only used when security is necessary, but on the filesystem of a dev station if a file has been touched you can be 99% sure that it was modified. Timestamps is how unix tools like 'make' or 'rsync' check if a file was modified.

Isn't that what we do already?

Yes but it seems traverse-dependencies function will only detect modified files if the entire dependency graph leading up to that file is also modified. So if entry point is 'a' and it requires 'b' and only 'b' was modified it wont be processed because 'worklist' only starts with 'a' and since it wasnt modified 'b' will never be added to it(unless I misread the code). To fix that it would be necessary to cache a 'deps' array(done in my PR) containing every dependency of a file, but that would require every file to have its md5 recalculated in the current code.

I think that's a great idea. I believe it's been requested before, too. The only problem would be serialising/deserialising regexps if you want to represent the AST as JSON.

I also got stuck into that but found a simple hack to work around it, check my PR for details. I say 'hack' because it uses details of escodegen implementation to work and may break for code that needs to do regexp processing in the AST. I dont believe minification tools do that so it should be file 99% of the times. A more correct approach would be to use JSON stringify/parse reviver/replacer arguments to serialize regexps, but that greatly increases parsing/stringifying the cache at the beginning/end of the process(It wont affect much if using the 'watch' option)

That's surprising, since escodegen is such a large portion of the build time. See #63 for more discussion about that.

Its true but my tool didnt use escodegen(I didnt knew about it or mozilla AST at the time). I guess that problem can also be fixed by using escodegen on the individual files and then combining the source map like I did in grunt-coffee-build. If you accept my PR I intend to send more improvements, including this one. Even with escodegen taking most of the time, my PR still improves the build time(you can benchmark with the 'time' unix command)