tauren / tmpl-precompile

Precompile jade templates into executable functions
MIT License
25 stars 7 forks source link

Create post processor to shrink script output size #2

Closed tauren closed 13 years ago

tauren commented 13 years ago

Post processor would take the compiled javascript function and significantly reduce the size by removing the duplication of functions. Here are some ideas on how to accomplish this.

  1. There is no reason to include the rethrow(), attr() and escape() functions with every template. Pull them out with a post-processor and only include them once per output file. Wrap the entire output file in a closure so these scripts don't pollute the root namespace.
  2. For production, could remove all of the debug code. The post processor could remove all lines that start with __.lineno =.
  3. For production, replace the rethrow function to output a simple error message that includes a link to a help page in the tmpl-precompile github project wiki. This help page would explain that tmpl-precompile has removed redundant and debugging code and to compile in non-production mode if you want regular template errors to show.
  4. For production, there isn't really a need to keep all the information in the __ variable. In fact, the entire __ might be able to be removed. It appears to only be used for debugging purposes.
  5. Whenever unescaped output is used h1!= title in a template, extra code is output to check for null and output an empty string. This could be optimized by pulling the current output of var __val__ = body; buf.push(null == __val__ ? "" : __val__); into another helper function and calling it:

    function print(val) { return null == val ? "" : val; }

    buf.push(print(body));

If all of this was completed, an output script file like https://github.com/tauren/tmpl-precompile/blob/master/examples/output/single-namespace.js could be reduced to something smaller like the following:

function() {

  function rethrow(err){
    err.message = 
      "Jade debug info removed. See http://goo.gl/somewhere\n\n" 
      + err.message;
    throw err;
  }

  function print(val) {
    return null == val ? "" : val;
  }

  function attrs(obj){
    var buf = []
      , terse = obj.terse;
    delete obj.terse;
    var keys = Object.keys(obj)
      , len = keys.length;
    if (len) {
      buf.push('');
      for (var i = 0; i < len; ++i) {
        var key = keys[i]
          , val = obj[key];
        if ('boolean' == typeof val || null == val) {
          if (val) {
            terse
              ? buf.push(key)
              : buf.push(key + '="' + key + '"');
          }
        } else {
          buf.push(key + '="' + escape(val) + '"');
        }
      }
    }
    return buf.join(' ');
  }

  function escape(html){
    return String(html)
      .replace(/&(?!\w+;)/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;');
  }

  window.singleNamespace = window.singleNamespace || {};
  window.singleNamespace.layout = function anonymous(locals) {
    try {
      var buf = [];
      with (locals || {}) {
        var interp;
        buf.push('<!DOCTYPE html>');
        buf.push('<html>');
        buf.push('<head>');
        buf.push('<title>');
        buf.push('</title>');
        buf.push('</head>');
        buf.push('<body>');
        buf.push('<h1>');
        buf.push('Content goes here');
        buf.push('</h1>');
        buf.push('<div');
        buf.push(attrs({ terse: true, 'id':('container') }));
        buf.push('>');
        buf.push(print(body));
        buf.push('</div>');
        buf.push('</body>');
        buf.push('</html>');
      }
      return buf.join("");
    } catch (err) {
      rethrow(err);
    }
  };

  window.singleNamespace.root = function anonymous(locals) {
    try {
      var buf = [];
      with (locals || {}) {
        var interp;
        buf.push('<h2>');
        buf.push('Hello');
        buf.push('</h2>');
        buf.push('<p>');
        buf.push('World!');
        buf.push('</p>');
      }
      return buf.join("");
    } catch (err) {
      rethrow(err);
    }
  };
}();

Note that this really only helps if you have lots of templates in a single file. One additional optimization could be to pull the helper scripts out completely. There would be an option that could be enabled in tmpl-precompile.json that would cause global helper functions to be used instead of embedding the helpers in the output file. There would be a helper option with a default value of true, which means "embed helpers in script file". If the value is false, then the helpers would not be embedded and the developer would have to include them separately in the application.

If this is implemented, it would be good to put the helpers into a namespace such as _j_ and add a tiny bit of logic to the beginning of a template to check if it exists. Any use of escape() or attrs() would need to change to _j_.escape(), etc.

conancat commented 13 years ago

Good idea! :) I think that we can probably do this by doing a find/replace with all the compiled JS from Jade before writing to the combined file. I think you have already have the output for the final compiled function, I guess it will be easier? Since the part for the shared functions are gonna be always the same, we could just use it as a fixed template as something to append to all combined JS files.

Anything you need help with? :D

tauren commented 13 years ago

@conancat I've been having a conversation with @visionmedia (TJ) in this jade issue which you should take a look at: https://github.com/visionmedia/jade/issues/261

TJ's opinions on changing Jade

TJ's willing to pull the helper methods out of the compiled functions, but that doing so means a "runtime" would need to be provided with those helpers. I haven't yet considered all of the implications -- depending upon TJ's approach, this issue may or may not become moot. Regardless, I do still like the solution I describe above, although TJ has the following to say about it:

  1. "removing the linenos is kinda over optimizing" -- I disagree and actually think this is a great optimization. There is no need for all the debugging code when using these templates in production, and to me it looks like close to 1/2 of the generated function is purely for debugging purposes (not counting the helpers).
  2. "the helper functions we can definitely adjust like you mention, though jade wont have anything to do with bundling up / serving the templates and the wrapper etc." -- it would be great if he removes the helpers from the generated templates.
  3. "the null check thing is over optimization as well, you'll gain a bit of space, but slow things down" -- I didn't consider the speed cost of this, I was only considering the file size. To keep the templates fast, it might be best to avoid additional function calls and leave this alone.
  4. "the major benefit will be moving those helpers out, but then we need to ensure you expose those to the client so it adds an extra layer which is kinda annoying but oh well" -- I'm in agreement with him here.

What we can do now

As far as what we should do now, I see the following options:

Contribute to Jade

We could help out the Jade project by forking visionmedia/jade and provide TJ with a pull request. I suggest we do the following:

  1. Move the helper methods out of the compiled functions.
  2. Make sure that normal use of jade still can find the helper functions.
  3. Add a new function call to the Jade API that will retrieve the helper functions such as var helpers = jade.getHelpers(). We could iterate through the collection of helpers and do helpers[key].toString() to include the helpers into our compiled templates.

The main thing is that I don't want tmpl-precompile to have Jade code lying around in a file. As soon as Jade is updated, it will become obsolete. It would be much better to pull the functions in realtime from Jade itself.

Figure out how to walk the UglifyJS AST

I think using UglifyJS's AST is the most promising approach that doesn't require Jade modifications. It would also be useful for many of the things listed above.

I do not think doing a regex or some sort of find/replace is the best way to solve this. A parser is needed to more effectively locate structured information like code. Using a regex approach is much more fragile and more likely to break with any changes to Jade.

I asked a question on SO and the overall consensus is to use a parser and an AST, which is what made me think of UglifyJS. I can't believe I didn't think of that first since I'm already using UglifyJS. http://stackoverflow.com/questions/6588977/how-to-to-extract-a-javascript-function-from-a-javascript-file

The problem is that working with the AST is not very straightforward. It will take some work to figure out the best approach to walking the AST and finding the nodes we want to extract or modify.

If you update to the latest tmpl-precompile I committed earlier, there is now a debug option for a group. If you set it to true, it will print the AST up to 10 levels deep. This will give you an idea of the AST structure. Here's a snippet of the output:

[ 'toplevel',
  [ [ 'stat',
      [ 'assign',
        true,
    [ 'dot',
          [ 'dot', [ 'name', 'SZ' ], 'templates3' ],
          'AppTemplate' ],
        [ 'function',
          'anonymous',
          [ 'locals' ],
          [ [ 'var',
              [ [ '__',
                  [ 'object', [ [Object], [Object], [Object] ] ] ] ] ],
            [ 'defun',
              'rethrow',
              [ 'err', 'str', 'filename', 'lineno' ],
              [ [ 'var',
                  [ [ 'context', [Object] ],
                    [ 'lines', [Object] ],
                    [ 'start', [Object] ],
                    [ 'end', [Object] ] ] ],
                [ 'var', [ [ 'context', [Object] ] ] ],

That AST should be able to be modified to remove rethrow, attr, and escape. When those nodes are extracted, they could probably be turned back into javascript as well.

I posted a question on the UglifyJS group related to this: http://groups.google.com/group/uglifyjs/browse_thread/thread/6235a3825f109b47#

Next steps

I'm not going to have time to get to either of these anytime soon. If you want to tackle one or the other, please do so! I've fleshed out a very basic starting point in the repository, but feel free to hack it however you see fit.

By the way, I do think that removing debugging information (__) should be an option that users of tmpl-precompile can turn off. In fact, all of these optimizations should be optional. The default should be to fully optimize a group of files and include the helpers just once. But there should also be an option to not include the helper functions in an output file at all and assume the user will load them separately or in another template file.

tj commented 13 years ago

I'm fine with pretty much everything as long as it's optional. as a compromise we could sniff NODE_ENV and strip the debugging info accordingly. The helpers were half-broken for the client-side anyways because rethrow() and possibly a few others were never inlined, the client-side support was somewhat of an after-thought, but I'm all for getting that to become more robust.

tauren commented 13 years ago

@visionmedia Thanks for your input. I'm curious if you have some idea of a timeline for improving the client-side support, or if it is just on your list of things you'd like to do. That will have an effect on how I move forward.

@conancat It really looks like UglifyJS is the way to go. It can be used to create an AST, and then a walker can be created to walk it and manipulate the AST. Here's an example: https://github.com/mishoo/UglifyJS/blob/master/tmp/instrument.js

Further suggestions by Mihai (@mishoo) are here: http://groups.google.com/group/uglifyjs/browse_thread/thread/6235a3825f109b47

tj commented 13 years ago

it's not a priority, but modifying via AST is a huge hack, a fork with patches until things are merged in makes way more sense

conancat commented 13 years ago

hi there! been busy with work here too, lol. I'll try and see if I can fish something out by this weekend or so!

I think we really could reduce the file sizes drastically by moving out the helper functions, personally I think they probably take up most of the space lol as they seem to repeat in every template.

IMO for the debugging, if they are already compiling the templates on the frontend if there are any errors it should already be thrown at compile time, any syntax errors shouldn't make it to the client-side anyway. The whole idea of precompiling is that the templates are already ready to use when it reaches client-side. :)

@visionmedia when you say forking with patches do you mean forking to the main jade repo? meaning to merge the template pre-compiling function to jade?

tj commented 13 years ago

yeah they definitely take a large chunk, specially for smaller templates, which client-side ones usually are.

yup, I think it makes more sense to fork jade, make the necessary changes and then use that for the now, until they are reviewed / merged into jade itself. the AST thing is a massive hack that's like parsing node's code to make a few changes instead of trying to get it in core or forking

tauren commented 13 years ago

@visionmedia @conancat I completely agree with TJ that using the AST to modify jade code is a hack that we should avoid. My absolute preference is to fork jade and make the modifications ourselves. This is why I listed it first in my comment above: https://github.com/tauren/tmpl-precompile/issues/2#issuecomment-1509747

If we did not have TJ's support or willingness to work with us in accepting our Jade pull request, then the next best thing would have been to use the AST approach. The point I was trying to make (that did not come across correctly) is that the AST approach is much better than attempting to remove the code via regex or having a copy of jade's helpers sitting around in a file within tmpl-precompile.

As far as I can tell, TJ does not want to include tmpl-precompile features into Jade. Jade should be kept simple and just a template engine. However, he doesn't mind if we modify Jade in ways that makes it easier for us to use. This could include:

  1. Move the helper functions outside of jade's compiled templates to that they are not including when using toString(). This means modifying code around the following lines:

    https://github.com/visionmedia/jade/blob/master/jade.js#L742 https://github.com/visionmedia/jade/blob/master/jade.js#L777

  2. Make sure the helper methods attr(), escape(), and rethrow() are accessible so we can do something like jade.helpers.toString() on them and include them in output generated by tmpl-precompile. This way tmpl-precompile can always get the most current version of those methods. We could then include a copy of these helpers in each group file, or create a "runtime" that could be downloaded separate from the group template files.
  3. Add additional option flags to Jade that can exclude debugging code from the compiled template output. This should remove all instances of __ usage from the compiled template. We will have to consider how to best accomplish this.

The main thing is that in no way do we change the default behavior of Jade. We don't want to break anything for existing users. Anything specialized we need to do should be optional and not enabled by default.

tj commented 13 years ago

yeah

tauren commented 13 years ago

Not that I'm recommending the AST approach, but if we find using an AST to be necessary, it looks like 'burrito' simplifies walking the AST:

http://substack.net/posts/eed898 https://github.com/substack/node-burrito

tj commented 13 years ago

i dont get the point of burrito, it's super trivial to implement an ast visitor

conancat commented 13 years ago

Hi guys!

@tauren After forking on Jade and the few little changes, I have managed to shrink a file to almost half the size compared to before. However I haven't pushed the changes over here for tmpl-precompile yet because @visionmedia haven't approved the fork yet... But here's what I was testing out with, and the template file sizes shrinked from 9kb to 4kb. :)

I have uploaded the files here for a little comparison, please check it out ya :D

conancat commented 13 years ago

Hello again guys! :)

Thanks to @visionmedia who merged the pull request yesterday, now we can disable compiling debugging at the functions, run Jade's functions without needing to include the whole Jade file and even get Jade's helpers under jade.runtime, yay!

So with that I have bumped tmpl-precompile to 0.1.3, which can now include helper functions as shared functions and remove debugging code. So tmpl-precompile can concatenate multiple template files, add namespacing, include runtime functions, minify, and the client just needs to include the compiled template files.

Thanks @visionmedia for your help! :)

@tauren I have pushed tmpl-precompile to npm, we can now npm install tmpl-precompile -g and run tmpl-precompile anywhere as a shell command. However because Jade's latest changes are not updated to NPM yet, so to test out at the moment we need to checkout the latest version of Jade instead of using NPM's version. Once it's updated to NPM then tmpl-precompile can start rocking! :) Of course, plenty of refactoring needs to be done lol, but at least the basic functionality works. :P

tauren commented 13 years ago

Awesome work @conancat, and thanks so much @visionmedia for promptly updating jade to support this!

And I'm glad tmpl-precompile is in npm now. I've never put something into npm before, so I'm going to check out what you did.

tj commented 13 years ago

no problem. I figured I would let everyone play with things a bit before releasing to make sure everything's ok :D seems fine though, definitely lightens up the templates

conancat commented 13 years ago

Hey @tauren... I have kinda moved things around, added in tests, added in Javascript callback option so that people can run the function in node environment (and so that it's easier for us to run tests, lol). I hope it's okay...... :) I think I'll go poke around something else after this. :P

I'm really glad that now we have a way to create JS templates on the client side with Jade that doesn't cost too much KBs!

tauren commented 13 years ago

@conancat -- i'm on vacation and haven't had a chance to look at things yet. but of course it's ok to improve the project! I'm glad that you've invested your time and resolved this issue. It sounds like it produces much more usable templates now.

@visionmedia -- could you clarify if these latest enhancements to jade are available in npm yet? Hopefully I'll have some time to play with things later tonight or tomorrow.