plasma-umass / Ninia

Python interpreter in JavaScript
Other
20 stars 8 forks source link

Settle on a representation for generic Python objects #18

Closed jvilk closed 9 years ago

jvilk commented 9 years ago

Proposal: Representation of Generic Python Objects

Overview: Python and JavaScript Objects

Let's review how Python and JavaScript objects function.

Python Objects

Python objects are strikingly similar to JavaScript objects:

I've broken the proposal into several sections below.

Basic Python Class Representation

Create a JavaScript constructor for each Python class. Prefix Python-visible properties with $ to avoid collisions with special JS object properties.

Each Python class will have exactly one JavaScript constructor associated with it that generates a JavaScript object that contains all of the Python class properties.

Example:

class Py_Dict {
  // Note: This is *not* static.
  $__getitem__: new Py_SyncNativeMethod(function(t: Thread, f: Py_Frame, self: Py_Dict, args: IPy_Object[])): IPy_Object {
    // etc
  });
}

Note that we will not use TypeScript inheritance to inherit from Python classes! Py_Dict will be mixed into other class' prototype chains. This is a necessity due to Python multiple inheritance (see below).

Due to the way the CPython bytecode works, We know all of the initial properties of program-created classes when they are first constructed, allowing us to do this. Should the properties change post-initialization... see a later section for details.

Python Class Inheritance

Use the linearization of the class's type hierarchy as the prototype chain.

Due to Python supporting multiple inheritance, we need to be able to have multiple versions of a class prototype for different prototype chains. When we encounter a new subclass, we will use the constructor mentioned above to generate a new JavaScript object with the appropriate class properties, and slam it into an appropriate prototype chain.

Thus, given that Py_Dict is a subclass of Object, we would construct a Py_Dict object prototype as follows:

// Py_Dict inherits from object; temporarily set the prototype of the class constructor function to object.
Py_Dict.prototype = new Py_Object();

function Py_Dict_Instance() {
  // Constructor for an *instance* of Py_Dict
}
// Now each instance of Py_Dict has the prototype chain:
// Py_Dict -- Object
Py_Dict_Instance.prototype = new Py_Dict();

Note that common prototype chains can be recycled, so long as they appear at the end of a subclass's prototype chain.

The type object for a class is merely the class object with type on the prototype, and a few special properties.

Every class in Python has a corresponding type object (if it's a new-style class). These type objects inherit from type, which inherits from object.

We can construct this object like so:

Py_Type.prototype = new Py_Object();
Py_Dict.prototype = new Py_Type();

function Py_Dict_Type() {
  // Special type object property: Tuple that contains the linearization of the inheritance hierarchy.
  this.$__mro__ = new Py_Tuple_Instance();
  this.$__mro__.append(this, Type_Py_Type, Type_Py_Object);
  // Constructor function. Py_Type objects are callable
  this.$__call__ = // etc.
}
Py_Dict_Type.prototype = new Py_Dict();

// We'll keep these type objects around as singletons.
var Type_Py_Dict = new Py_Dict_Type();

Bound Methods

To support Python's bound methods, all class methods will be implemented as Python properties that return a bound method when accessed through an object. CPython does the same thing.

While this sounds expensive (as binding the method occurs every time a method property is accessed on an object), PyPy's bytecode compiler, which emits CPython 2.7 bytecode, has an option to emit a new bytecode instruction that removes this overhead. CPython lacks this optimization, and creates a newly bound method each time the property is accessed.

Special __dict__ property

By default, Python objects have a __dict__ property that lets programs access object properties as a Python dictionary. We can support __dict__ by making a Py_Dict class that encapsulates the JavaScript object corresponding to a Python object and presents it as a dictionary. For example, dict.get('foo') would return obj_instance['$foo'], so long as obj_instance.hasOwnProperty('$foo') (since __dict__ does not contain any inherited properties, unless you're a builtin type).

Like generic Python dictionaries, these allow arbitrary keys. However, other implementations, like PyPy, restrict the keys to be strings only, which seems like a reasonable restriction.

Changing Class Properties

Unfortunately, Python programs can change and add properties to classes post-initialization. This poses a problem, as we have multiple versions of the class object in memory: at most one for every subclass.

Thankfully (or weirdly?), Python classes track their subclasses via the __subclasses__ property. We can simply apply the change across all unique class objects. I assume that this is an uncommon operation, making the performance tradeoff acceptable.

Overriding mro()

Python programs can override the special mro() method to change the linearization of the inheritance hierarchy, but only during particular moments.

This is a highly unlikely and uncommon scenario, but we can actually support it in all modern browsers using Object.setPrototypeOf(), or a polyfill of that method, to rearrange the prototypes of all dependent classes during runtime. JavaScript engines hate this function, as it deoptimizes any code that these objects touch, but that's what you get for using so much dynamism.

This approach does not work in versions of IE prior to 11. If the mro() change does not add additional classes to the inheritance hierarchy, then we can simply, and terribly, mangle every object in each prototype chain to make Foos into Bars in those browsers.

jvilk commented 9 years ago

I also wonder how we'll handle when the value at a property is a JavaScript method. We'll probably need to create on-the-fly reflection objects for them.

jvilk commented 9 years ago

OK, another note: When Python loads an object function via e.g. LOAD_ATTR, it appears that that function is bound to the object it belongs to? Seems wrong, but I've hacked that to be the way Ninia does it.

geremih commented 9 years ago

I did not understand the special-case list accesses part. What do you mean by generic object property accesses? Also unlike JavaScript, the values of keys for Python mappings/sequences are not the same as the corresponding object attributes. The opcode used to access them is also different, BINARY_SUBSCR is used for the former, and LOAD_ATTR for the latter. So for mappings and sequences, we can simply have a internal Array or Object for subscript accesses which can be accessed through an internal subscr() function instead of any special cases?

jvilk commented 9 years ago

Oh? I actually did not know that. Full disclosure: I haven't done much Python programming, and I'm learning its bytecode as I go.

It looks like I should just call the methods described here rather than special case it for built-in types.

jvilk commented 9 years ago

Updated with more details.

But those details could be invalidated, depending on how Python inheritance works!

jvilk commented 9 years ago

Alright, I think I finally have a handle on all this. Currently working on an implementation.

jvilk commented 9 years ago

More details. I believe this is the final design, barring any mistakes in the examples!

perimosocordiae commented 9 years ago

Seems like things have stabilized. How about we move this to the wiki and close the issue?

jvilk commented 9 years ago

Done