Feature suggestion: optimizing `var` definitions

chrisaljoudi commented 11 years ago

Hi! I came across a case where there's a possible trick that can reduce size significantly. Take the following piece of code:

var x = 12345678, y = 12345678;

It compresses to:

var x=12345678,y=12345678;

Can't it be optimized to the following:

var x=12345678,y=x;

Or is there something I'm missing? I think it should be available as an unsafe optimization.

Thanks!

mal commented 11 years ago

Also, if you could determine that x was read only and never used in scope after y had been mutated, or vice versa (or even that both were read only) you could potentially optimise one of the two variables away completely.

Have to be careful of inner closures, probably pretty unsafe, could be fun though! :stuck_out_tongue:

ForbesLindesay commented 11 years ago

The optimisation above would be completely safe though

staabm commented 11 years ago

The optimization is only safe when the vars contain primitives, isn't it?

chrisaljoudi commented 11 years ago

@staabm I think it's safe when the expression-node being assigned to the variables has .has_side_effects() false.

ForbesLindesay commented 11 years ago

In general it is safe to optimise:

var x = E, y = E;

to

var x = E, y = x;

For all E provided that E:

has no side effects
returns the same value every time

e.g. The following would not be safe to optimise even though Math.random() has no side effects:

var x =  Math.random(), y =  Math.random();

The following would be though:

function get() { return 5; }
var x =  get(), y =  get();

The follwoing would not because {} === {} is false

function get() { return {}; }
var x =  get(), y =  get();

but the following would be:

var val = {};
var x = val, y = val;

chrisaljoudi commented 11 years ago

@ForbesLindesay You're definitely right, sorry for being inaccurate.

ForbesLindesay commented 11 years ago

Interestingly it would be safe with certain very specific side effects, but this is probably more in depth analysis than we'd want to do:

var got = false;
function get() {
  return got = true;
}
var x =  get(), y =  get();

because although get has side effects, it doesn't have a side effect if you run it a second time having not reset got to false.

ForbesLindesay commented 11 years ago

No worries @abody these things are tricky at the best of times, my initial intuition which I started writing down was completely false.

chrisaljoudi commented 11 years ago

@ForbesLindesay Yeah that's too deep an analysis -- you'd expect the person who's coding the above would optimize it himself/herself. I initially just thought about this optimization because I had initialized lots of variables to null.

chrisaljoudi commented 11 years ago

@ForbesLindesay There are also cases where defined getters might have side effects, but it's too hard to spot those. That's why I suggested to just make the optimization disabled by default/unsafe.

ForbesLindesay commented 11 years ago

You wouldn't want it to run for anything that could be a getter, because that's way too likely to be unsafe. Just code the optimisation so it only does it when we know it's safe (i.e. expressions that don't contain property accessors, function calls, object literals, array literals or regexp literals).

Perhaps more significantly, it slows runtime performance if not done carefully: http://jsperf.com/optimizing-var-definitions (needs more runs to be statistically significant).

I suspect it will have negligible or no affect on file size once gzip has done it's thing. It may ultimately not be worth optimising, unless you want to handle the case of a call to a function which has no side affects and always returns the same value. e.g. the performance gain of optimising the following would be huge:

function factorial(n) {
  if (n === 0) return 1;
  else if (n > 0) return n * factorial(n - 1);
}
var a = factorial(50), b = factorial(50);

How often such code actually appears though I'm not sure.

Do we know whether UglifyJS already tracks which functions are 'pure' functions (i.e. no side effects and always return the exact same value for the same input)? If it does, this may be doable, if not then it's a fair bit of work for minimal gain.

chrisaljoudi commented 11 years ago

@ForbesLindesay Oh interesting benchmark; I'm running Chrome 23 and actually the separate declaration is slower than either the remaining two. You can never really know if there's a getter defined; and it needn't be a property accessor to be unsafe (think about the case where you do window.__defineGetter__("something", someFunctionThatDoesEvilStuff)).

Perhaps you're right about the negligible difference in file size. I'm not sure whether UglifyJS keeps track of such 'pure' functions, perhaps @mishoo can clarify?

ForbesLindesay commented 11 years ago

OK, anything that can be a getter then, which should include all property accessors and any variable that's not found in the current scope (and therefore could be a property on the window)

ForbesLindesay commented 11 years ago

this.defineGetter wouldn't make any difference, I've already said we should exempt property accessors and 'global' variables, this.name is a property accessor.

ForbesLindesay commented 11 years ago

It might need to also be disabled inside with blocks or where there's evals and new Functions, but virtually all optimisations are disabled inside with blocks etc.

chrisaljoudi commented 11 years ago

@ForbesLindesay Oh. Never mind about my previous comment. I completely missed the point. Sorry!

ForbesLindesay commented 11 years ago

For anyone else reading:

function test() {
    this.__defineGetter__("x", function() {
        alert("This is evil.");
        return 5;
    });
    return x;
}
console.log(new test());

just throws a reference error and if you're in stict mode then so does the above code. The above code in non-strict mode is just assigning x as a property of the global object so the following is also fine:

(function () {
  var x = 10;
  function test() {
    this.__defineGetter__("x", function() {
        alert("This is evil.");
        return 5;
    });
    return x;
  }
  console.log(test()); //logs 10
}())

mishoo commented 11 years ago

This might seem simple enough to wonder “is there a good reason not to do it?”, but it's in fact quite complicated in the general case.

Doing it only for the case var i = XXX, j = XXX is fairly simple, indeed, but won't probably save one byte in 99.9% of cases. Doing it for the case var i = XXX; ...; var j = XXX requires data flow analysis, and there's some serious theory behind this term. I investigated the possibilities but for the near future I don't think we'll have this (it's complicated by two issues: (1) closures, and (2) JS is a complex beast; normally compilers do DFA on a very simple intermediate language, not on the original source).

This comment also applies to mishoo/UglifyJS2#68 and mishoo/UglifyJS2#27 — data flow analysis would help dealing with them too.

chrisaljoudi commented 11 years ago

@mishoo I see. Well, I just tested with the Google Closure Compiler (it actually took me some time to work around it optimizing away the variables and replacing them with their values). It appears it doesn't do such an optimization, either. Thanks for your explanation!

SoniEx2 commented 9 years ago

This could be easily doable by using register-based bytecode, instead of AST, as IR. You'd lose comments but you could always add a bytecode for comments if you wanted to :P

michaelficarra commented 9 years ago

@SoniEx2 Unless you're very careful with your design, transformations on that bytecode would result in behaviour that cannot be translated back to JavaScript.

SoniEx2 commented 9 years ago

@michaelficarra You can turn Lua bytecode back and forth, at least... altho slightly inefficiently for some things... Maybe optimize based on bytecode and minify based on JS? (JS -> bytecode -> optimize -> JS -> minify)

michaelficarra commented 9 years ago

Sure, it can be done, but that optimisation step is going to be tricky. You have to make sure the bytecode's overall structure remains representable in JavaScript, or at least can be filled in to be representable. As an example, a bytecode with arbitrary jumps can jump into the middle of a "loop", but that behaviour cannot be represented in JavaScript; at least, not easily.

SoniEx2 commented 9 years ago

I see what you mean. But if you want to optimize efficiently an IR is usually the way to go.

Mrgaton commented 1 month ago

this isnt fixed yet?

mishoo / UglifyJS

Feature suggestion: optimizing `var` definitions #75