overlookmotel / livepack

Serialize live running code to Javascript
MIT License
38 stars 1 forks source link

Only serialize properties of objects which are accessed #169

Open overlookmotel opened 3 years ago

overlookmotel commented 3 years ago

Input:

const obj = { x: 1, y: 2, z: 3 };
export default () => obj.x;

Current output:

export default ( obj => () => obj.x )( { x: 1, y: 2, z: 3 } );

As x is only property of obj which is accessed, this could be reduced to:

export default ( x => () => x )( 1 );

Optimizations

The following optimizations can be applied:

1. Omit unused properties

Where only certain properties of an object are accessed, any other properties can be omitted:

// Input
const obj = { x: 1, y: 2, z: 3 };
export default () => obj.x + obj['y'];

// Output
export default ( obj => () => obj.x + obj.y )( { x: 1, y: 2 } );

Note property z has been discarded in output.

2. Break object properties apart with scopes

Where object is never used as a whole (only individual properties accessed by name), each property can be split into a separate scope var.

// Input
const obj = { x: 1, y: 2, z: 3 };
export default {
  getX: () => obj.x,
  setX: v => obj.x = v,
  getY: () => obj.y,
  setY: v => obj.y = v
};

// Output
const scope1 = ( x => [ () => x, v => x = v ] )( 1 ),
  scope2 = ( y => [ () => y, v => y = v ] )( 2 );
export default {
  getX: scope1[0],
  setX: scope1[1],
  getY: scope2[0],
  setY: scope2[1]
};

getX() + setX() can be code-split into a separate file from getY() + setY().

3. Break object properties apart with object wrappers

Where object is never used as a whole (only individual properties accessed by name), each property can be wrapped in a separate object.

// Input - same as (2)
const obj = { x: 1, y: 2, z: 3 };
export default {
  getX: () => obj.x,
  setX: v => obj.x = v,
  getY: () => obj.y,
  setY: v => obj.y = v
};

// Output
const objX = { x: 1 },
  objY = { y: 2 };
export default {
  getX: ( objX => () => objX.x )( objX ),
  setX: ( objX => v => objX.x = v )( objX ),
  getY: ( objY => () => objY.y )( objY ),
  setY: ( objY => v => objY.y = v )( objY )
};

Using 2 wrapper objects is slightly more verbose than output from optimizations (1) or (2), but more code-splittable than either. getX(), setX(), getY() and setY() could each be in separate files with objX and objY split into separate common files.

4. Reduce to static values

Where a property is read only (never written to in any functions serialized), the property can be reduced to a static value.

// Input
const obj = { x: 1, y: 2, z: 3 };
export default {
  getX: () => obj.x,
  getY: () => obj.y
};

// Output
export default {
  getX: ( x => () => x )( 1 ),
  getY: ( y => () => y )( 2 )
};

This is completely code-splittable. It's more efficient than any of the other approaches above, but only works if obj.x and obj.y are read-only.

Optimization killers

None of these optimizations can be used if:

  1. Object used standalone e.g. const objCopy = obj; or fn( obj )
  2. Object properties accessed with dynamic lookup e.g. obj[ name ]
  3. Object passed as this in a method call e.g. obj.getX() and .getX() uses this
  4. Property is getters/setters
  5. Property is not defined, so access will fall through to object's prototype
  6. Property may be deleted by code elsewhere (delete obj.x) so a later access may fall through to object's prototype
  7. An eval() has access to object in scope (no way to know ahead of time how the object will be used)

Tempting to think could still apply optimization (3) in cases of undefined properties by defining object wrapper as objX = Object.create( originalObjectPrototype ). However, this won't work as it's possible object's prototype is altered later with Object.setPrototypeOf().

It's impossible to accurately detect any changes made to the object with Object.defineProperty() - which could change property values, or change properties to getters/setters. However, this isn't a problem - the call to Object.defineProperty( obj ) would involve using the object standalone, and so would prevent optimization due to restriction (1) above.

ESM

These optimizations would also have effect of tree-shaking ESM (#53).

ESM is transpiled to CommonJS in Livepack's Babel plugin, prior to being run or serialized:

// Input
import { createElement } from 'react';
export default () => createElement( 'div', null, 'Hello!' );

// Transpiled to (before code runs)
const _react = require('react');
module.exports = () => _react.createElement( 'div', null, 'Hello!' );

Consequently, when this function is serialized, the whole of the _react object is in scope and is serialized, whereas all we actually need is the .createElement property.

Optimization (4) (the most efficient one) would apply, except in case of export let where the value of the export can be changed dynamically in a function (pretty rare case).

Difficulties

I can foresee several difficulties implementing this:

overlookmotel commented 3 years ago

Tempting to think that functions accessing a read-only object property can be optimized to access only that property even if another function accesses the object whole. However, that's not possible, as the function accessing the object whole could use Object.defineProperty() to redefine that property - so it's actually not read-only at all.

const O = Object,
  d = O['define' + 'Property'];
function write(o, n, v) {
  d( o, n, { value: v } );
}

const obj = { x: 1, y: 2 };
export default {
  getX() {
    return obj.x;
  },
  setX(v) {
    write(obj, 'x', v);
  }
};

setX() writes to obj.x but it's not possible through static analysis to detect that it will do this. So can't know if obj.x is read only or not.

You can optimize getX() if only other use of obj is via dynamic property lookup (obj[ name ]) and not within an assignment (obj[ name ] = ...). i.e. Optimization killer (2) won't apply if it's read-only.

overlookmotel commented 3 years ago

Detecting read-only properties will also need to spot assignment via deconstruction e.g.:

( { a: obj.x } = { a: 123 } );
[ obj.x ] = [ 123 ];
[ { a: { b: obj.x } } ] = [ { a: { b: 123 } } ];
( { ...obj.x } = {} );
[ ...obj.x ] = [];
overlookmotel commented 3 years ago

Another optimization killer:

If object can be accessed via arguments, needs to be treated as a whole object access. e.g.:

function fn(obj, obj2) {
  return {
    getX: () => obj.x,
    deleteProp: n => delete arguments[n].x
  };
}
export default fn( { x: 1 }, { x: 2 } );

It's not possible to optimize access to obj.x in getX() because impossible to know if deleteProp() will delete obj.x property or not.

If deleteProp() contained arguments[1] then it would be possible to deduce that obj.x is unaffected, however this is a pretty niche case so probably not worth the complication. Any use of arguments should be conservatively treated as potential whole object access of every variable in the function's arguments.

NB In sloppy mode, arguments[0] = has side effect of also changing value of obj, so obj should not be considered read-only either.

overlookmotel commented 3 years ago

Concerning when to serialize function scope vars:

  1. Properties used can be serialized immediately.
  2. Serialization of the whole object will have to be delayed until later.

The problem with (2) is determining circularity. This is quite simple for the object itself (which is the value of the external var). Could just check if a record exists for the value and, if so, whether it has a .node property. If so, it is circular and this can be recorded at time of serializing the function.

However, the problem is where the object has properties which are circular. e.g.:

let obj = { x: 1 };
const methods = {
  getX: () => obj.x
};
obj.methods = methods;
export default {
  methods,
  setObj: v => obj = v
};

obj is in scope of getX() and has methods as one of it's properties. This is a circular reference at the time getX() is serialized.

When getX() is serialized, obj.x is identified as being accessed and so obj.x serialized on assumption this access can be optimized. It's also checked if obj is a circular reference. It's not.

When setObj() is serialized, it becomes clear that obj is not read-only, and so it's necessary to deoptimize getX() to access the whole obj. obj does contain what would have been a circular reference if it was serialized at the time getX() was serialized, but at this later point, there's no way to know this as the stack has unwound by this point.

Solutions

1. Record stack

At the time getX() is serialized, record the stack and store it to be used later to identify circular references if getX() is deoptimized and obj needs to be serialized in that context.

This would require keeping a record of the stack at all times when serializing (currently no such facility exists) and a mechanism for swapping the stack for an older one when a deoptimization occurs.

2. Two-pass serialization

Serialization could be performed in 2 phases:

  1. Trace: Trace dependencies of values, to create a graph.
  2. Serialize: Serialize values to JS based on this graph.

Currently these two phases occur together in one pass.

Functions would initially be traced on assumption object property accesses can be fully optimized. After phase 1 is complete, all the information necessary to decide which functions need to be deoptimized has been gathered. Any scope vars which need to be deoptimized can then be traced.

It's only during the 2nd phase - as JS code is written out - that it's determined where there are circular references.

This would be a major rewrite - the way Livepack is written at present doesn't support this at all. Every serialization function would need to be split into two - a trace method and a serialize method.

However, it would have some other advantages:

Once collapsing top-level scopes into external vars is implemented (#81), I think it will allow avoiding unnecessary late assignments, as things that would have been circular references cease to be so. In example above, whether obj.methods property definition needs to be assigned late, or whether it can be defined inline as part of obj's definition depends on whether getX() is defined as a function inline referencing external vars, or whether it's created within a scope. Which it is cannot be known until scopes are constructed.

i.e. output could be either:

// getX defined inline
const methods = {
  getX: () => obj.x
};
let obj = { x: 1, methods };
export default {
  methods,
  setObj: v => obj = v
};
// getX defined with scope function
let obj = { x: 1 };
const methods = {
  getX: (obj => () => obj.x)(obj)
};
// This extra assignment is required, as `obj` must be defined before `methods` due to its use above
obj.methods = methods;
export default {
  methods,
  setObj: v => obj = v
};

It may be possible to still achieve this without the change to 2 passes, as long as obj is serialized late, at which point it's known whether getX() needs to be defined with a scope function or not.

However, I think this case makes that impossible, as it's only when obj is serialized and it's .getX2 property is encountered that it becomes apparent that getX() does need to be defined with a scope function as there are two instances of it:

function create(obj) {
  const methods = {
    getX: () => obj.x
  };
  obj.methods = methods;
  return {
    methods,
    setObj: v => obj = v
  };
};

const obj = { x: 1 };
export default create( obj );
obj.getX2 = create( {} ).methods.getX;

Other advantages:

  1. It would make #182 and #75 easier to solve (though they don't require this change - could be solved by other methods).
  2. Output JS can be created on the fly without storing .node properties for each record and then later linking them all up.
  3. JS for code splitting can be created on the fly too during output, rather than injecting imports / require()s later on.
  4. In the long run, it may make the codebase easier to understand.

I also thought it would help select the most appropriate optimization (optimization 2 or 3 in top post above) to apply depending on what code-splitting needs are. Currently this is not known at serialization time, but actually it'd be possible even without this change as scopes are created after all dependencies are traced anyway.

overlookmotel commented 1 year ago

Two-phase serialization now has its own issue: #426