puffnfresh / roy

Small functional language that compiles to JavaScript.
http://roy.brianmckenna.org/
MIT License
834 stars 74 forks source link

Sound type system for objects #139

Open taku0 opened 12 years ago

taku0 commented 12 years ago

Current type system for objects are not sound.

There are several type systems with type inference algorithms for objects.

Some are powerful but awkward. Others are simple but restricted.

Here are some interesting type systems. We may need more research, and then we must define the type system for Roy.

RC-Types

Jonathan Eifrig, Scott Smith, and Valery Trifonov, Sound polymorphic type inference for objects

Tian Zhao, Polymorphic type inference for scripting languages with object extensions

Both adopt recursively constraint types (a.k.a. rc-types).

Types are represented with conventional types with subtype constraints.

It is extremely powerful. Intersection, union, and recursive types are representable.

Objects are extensible, i.e. fields can be added to objects after they are created.

Types can be huge, hard to read or write.

Complicated error messages.

Some errors are deferred. They are reported only at new expressions or function application expressions but not at class/function definitions.

Polymorphic Record Calculus

Atsushi Ohori, A Polymorphic Record Calculus and Its Compilation

Type variables are bounded with record kinds.

The type inference is a natural extension of ML-like let polymorphism.

Records are monomorphic while functions taking records are polymorphic. For example, λr. (r.x + 1) has a type ∀σ::{{x: int}}. σ → int, where σ::{{x: int}} requires the type σ to have an integer field “x”.

Objects can be compiled into arrays, so that field accesses is just index accesses. This may be irrelevant to Roy since objects in Roy are compiled into JavaScript objects.

Cannot type λf. (f {a: 1, b: 1}) + (f {a: 1, c: 1}) without rank-2 polymorphism, since the type of the f cannot be instantiated with {a: int, b: int} → int and {a: int, c: int} → int simultaneously.

Cannot represent object extension.

Adopted by SML#, with rank-1 polymorphism.

BTW, rank-1 polymorphism itself is also interesting.

Row Variables

Didier Rémy, Type inference for records in natural extension of ML

They adopt row variables, which represent infinite unknown fields.

The type inference is a natural extension of ML-like let polymorphism.

Objects are extensible.

Objects are polymorphic. For example, {name: "foo"; age: 1} has a type {name: 'u.string; age: 'v.num; abs.α}, which means that they can be instantiated to record types with or without a string field “name”, and with or without a number field “age”, but without other fields. “abs” represents absence of other fields.

They may requires explicit forget operations, which forget existence of fields.

λf. (f {a: 1; b: 1}) + (f {a: 1; c: 1}) is typed with {a: 'u.num; b: abs.num; c: abs.num; abs.α} → num, which means f may or may not require a number field “a”, may not require number fields “b” or “c”, and may not require other fields. Note that the fields “b” and “c” are typed num in spite of their absence.

puffnfresh commented 12 years ago

Thanks for the awesome summary! You've made the different approaches very clear.

A few weeks ago I started work on generalising Hindley-Milner to a constraint based system (using Typing Haskell in Haskell as a guide). I'll then be able to add structural subtype constraints.

I'll try to push the WIP branch later on today. Hopefully you can take a look and let me know what you think.

I know you tweeted that a polymorphic record calculus could make sense for Roy. Do you still think this is the case?

puffnfresh commented 12 years ago

Here's the branch: https://github.com/pufuwozu/roy/commits/constraints

This is the new type checker: https://github.com/pufuwozu/roy/blob/constraints/src/typeinference.js

Very incomplete.

taku0 commented 12 years ago

The generalized type inferer would be a good foundation for the new type system, no matter which type system you choose or create new one. I will look the branch.

I know you tweeted that a polymorphic record calculus could make sense for Roy. Do you still think this is the case?

Since the main advantage of the polymorphic record calculus—that is, objects compiled into arrays—is irrelevant to Roy, the type systems with row variables (the system Π* or the system Π) are attractive since they can extend objects (that means we can represent mixins!). However, the syntax of the system Π* is complicated a bit. We need to improve the syntax. For example, writing num? instead of 'u.num, num instead of pre.num, and !num instead of abs.num whenever possible. Another option is the system Π, not as powerful as the system Π* but more powerful than the polymorphic record calculus. It is sufficient to represent object extensions without complicating the syntax.

P.S. Here is a list of touchstone expressions. Some type systems can type them, others not.

simple object literal: {a: 1, b: 1}

object literal with generic function: {a: λx. x, b: λx. x}

simple function taking object: λx. x.a + 1

function taking a function taking object: λf. f({a: 1, b: 1})

function taking a function taking objects with different labels: λf. f({a: 1, b: 1}) + f({a: 1, c: 1}) (not typed with the system Π or the polymorphic record calculus)

function taking a function taking objects with same labels but different types: λf. f({a: 1, b: 1}) + f({a: 1, b: "a"}) (not typed with the system Π or the polymorphic record calculus, the system Π* requires explicit forget operator)

list of objects with different labels: [{a: 1, b: 1}, {a: 1, c: 1}] (not typed with the system Π or the polymorphic record calculus)

list of objects with same labels but different types: [{a: 1, b: 1}, {a: 1, b: "a"}] (not typed with the system Π or the polymorphic record calculus, the system Π* requires explicit forget operator)

object extension (a fresh copy of x with a new field f): λx. {x with f = 1} (not representable in the polymorphic record calculus)

object merging (a fresh copy of x with all fields from y): λx y. {x with y} (not representable in the any type systems mentioned above)

field forgetting (a fresh copy of x without field f): λx. {x without f} (not representable in the polymorphic record calculus)

taku0 commented 12 years ago

More complex examples:

{a: bool, ...} → {a: bool}, {a: bool} → {a: bool}, or type error? λx. if x.a then x else {a: false}

{a: bool, b: bool, ...} → {b: bool}, {a: bool, ...} → {}, or type error? λx. if x.a then x else {b: false}

{a: bool, ...} → {} or type error? λx. if x.a then x else {a: 1}

({a: num} → num) → {a: num, ...} → num? λf x. f x + f({a: 1})

({a: bool, b: num} → bool) → ({a: bool, b: num} → bool) or ({a: bool, b: num} → bool) → ({a: bool, b: num, ...} → bool)? λf. if f({a: true, b: 1}) then f else λx. x.a

({a: num, b: num} → bool) → ({a: bool, b: num, c: bool} → bool), ({a: num, b: num, c: bool} → bool) → ({a: bool, b: num, c: bool} → bool), ({a: num, b: num} → bool) → ({a: num, b: num, c: bool, ...} → bool), or type error? λf. if f({a: 1, b: 1}) then f else λx. x.c

taku0 commented 12 years ago

I have read the new type checker. It is a nice functional code :), much readable than current code.

Currently, the typing rules for objects are like this:

          A_1, C_1 ⊢ e_1: t_1 ... A_n, C_n ⊢ e_n: t_n
------------------------------------------------------------------
∪A_i, ∪C_i ⊢ {l_1: e_1, ..., l_n: e_n}: {l_1: t_1, ..., l_n: t_n} 
         A, C ⊢ x: τ
--------------------------------
A, C ∪ {τ ≡ {l: β}} ⊢ x.l: β

Both rule use equality constraints and monomorphic object types that have only known fields but not unknown fields. This is, as you know, inappropriate. Even {x: 1, y: 1}.x is not typeable. We must adopt either subtyping constraints or polymorphic types that represent some known fields and other unknown fields.

We need to clarify the requirements for objects. How will objects be used in Roy? If we need full featured object library like Scala, we might need rc-types. If objects are supplemental like OCaml, we should keep the type system simple.