Union Types: Inference and Checking.

keean commented 7 years ago

Here's a problem with (anonymous) union types, given the following:

f x = (x "ABC", x 123)

what is the type?

shelby3 commented 7 years ago

I wrote something 3 years ago which may or may not be relevant (and may or may not have been correct, as I've learned a lot since I wrote that).

keean commented 7 years ago

@shelby3 There is more than one way to type this, which is generally seen as bad:

f : (forall a . a -> a) -> (String | Int)
f : ((String | Int) -> a) -> a
f : ((String -> String) & (Int -> Int)) -> (String | Int)

The problem is we don't know which type to give to f x = (x "ABC", x 123) as they are all valid types.

keean commented 7 years ago

@shelby3 I read the quiz, but I don't think it is relevant as we are not considering widening here, but which variant above should be used. In some regards, the third type seems to be the correct one, as given:

f : ((String -> String) & (Int -> Int)) -> (String | Int)

because we would need a subsumption rule that allowed:

id : forall a . a -> a

To be passed to an argument where (String -> String) & (Int -> Int) is expected.

The problem is we will need expansion variables in the types to do it properly.

shelby3 commented 7 years ago

@keean wrote:

this paper seems relevant: http://arxiv.org/pdf/1606.01106v1.pdf

Edit: Reading the paper, it looks like they run into the usual problem of needing to have whole-program compilation.

The issue is I presume related to principal typings¹.

Remember we had a related discussion before in private messages in May on the Rust forum, about how to infer types that end up being recursive (like two mirrors pointing at each other).

And my comment then was that if we require the containing function to have declared types, then we should be able to always resolve the locally inferred types within the containing function. Afaik, ML and Haskell allow the programmer to elide the types from top-level functions, thus making inference much more difficult when first-class, higher-order features are added moving away from the origin of the Lambda Cube. Seems we won't have that problem.

Trevor, Jim. What are principal typings and what are they good for? http://trevorjim.com/papers/principal-typings.ps.gz

keean commented 7 years ago

The paper you refer to only allows rank1 intersection types, and does not specifically deal with union types. To allow full intersection types will require expansion variables. I have written type inference with expansion variables before, but it was quite a while ago. Maybe we restrict ourselves to rank1 for now, infact I will probably start with rank0 that was the most recent thing I was working on. This allows intersection/union types in the context.

The way I was thinking of doing this, because only exported functions need type signatures, is that we have a type-signature for a module much like an interface, which is then implemented. Something like:

module Stack<a>:
    push : a -> ()
    pop : () -> a

push x = ...
pop = ...
local_fn = ...

This makes me think we need to sort the function syntax before doing more on inference. I am going to start a new issue.

shelby3 commented 7 years ago

@keean wrote:

I don't think this example shows any of the potential problems, so I think its okay. However I think things get more complicated if f and g are not top-level functions:

h<A : Print>(x: Boolean, g: (x: A), f: (y: Boolean) -> Int | String) = g(f(x))

which is okay with full type annotations but we would run into trouble trying to infer this from:

h(x, g, f) = g(f(x))

Afaics the lack of sufficient constraints will never exist, i.e. those cases as principal typings, if all top-level lexical scope functions arguments are required to have their arguments annotated with type declarations. We agreed that all "exported" functions would be required have their arguments annotated with type declarations, but we haven't yet discussed requiring that for all top-level lexical scope functions.

Functions declared within the lexical scope of functions wouldn't need to be annotated, because they can only access instances within their top-level enclosing lexical scope, so the necessary constraints will exist for the inference to be decidable. However, I am not knowledgeable about if there is an existing inference algorithm for us to use.

keean commented 7 years ago

@shelby3 I have an novel inference algorithm that I like, that is capable of supporting intersection and union types with full expansion variables. The current version check into git here https://github.com/keean/Compositional-Typing-Inference It has some novel properties, it can type any code fragment (even with free-variables) independently and the types are composable, meaning two fragments can be combined without retyping the fragments themselves. It is HM like, but has principal typings. I will try and find the old version with expansion variables, and then I can port from C++ to JavaScript, which should remove all the memory management boilerplate.

shelby3 commented 7 years ago

@keean wrote:

So I think where I need the unions is in something like a parser:
y = if x:
        new Add(new Num(1), new Num(2))
    else:
        new Mul(new Num(1), new Num(2))
y.print()
So y would have the type Add<Num, Num> | Mul<Num, Num> and y.print() would have to be passed both dictionaries.

But with a parser its all going to go horribly wrong because we will need recursive types, and at every parse decision we will double the number of terms in type.

Consider the type signatures of the parser functions
parse_num(in : Stream) : Num
parse_add(in : Stream) : Add<Num, Num>
parse_mul(in : Stream) : Mul<Num, Num>
parse_expr(in : Stream) : Num | Add<Num, Num> | Add<Add<Num, Num>, Num> | Add<Add<Add<Num, Num>, Num>, Num> ... etc forever.
So we will also need recursive types, and be able to do something like:
type Expr = Num | Add<Expr, Expr> | Mul<Expr, Expr>
parse_expr(in : Stream) : Expr
Is that going to work? Can we infer the recursive types? What does it mean for dictionary passing?

Again for readers' benefit, I will to link to what sum, product, and recursive types are.

You are asking me if we will have recursive types in ZenScript and if so, what are the implications especially on inferring non-annotated types.

For data types (not typeclasses aka Rust trait), we could choose to allow expression of recursive types and it won't require us to add subclassing and virtual methods. For example, here is your recursive data type example expressed with class:

sealed abstract class Expr { ... }
class Num extends Expr { ... }
class Add<A:Expr, B:Expr> extends Expr { ... }
class Mul<A:Expr, B:Expr> extends Expr { ... }

You have expressed polymorphism. If we want to pass an Expr to a function taking a trait bound, then the compiler will have to check that implementations of said trait exist for all the sealed subtypes of Expr and automatically generate the type-case boilerplate per my explanation for my complete solution to the Expression Problem. So the problems of subclassing and virtual methods do not apply, even though we have a subtyping relationship expressed. You have provided an example of what I was explaining before that polymorphism arises naturally in programming if there is no dependent typing.

However, the above has the problem that it can't be extended with new variants because it is necessarily sealed, but this loses the desirable attribute of being a solution for the Expression Problem. NFG! (no fscking good) :stuck_out_tongue_winking_eye:

If we prefer to have that remain open to extension of new variants (i.e. not sealed), then we have to specifically tell the compiler all of the variants which are in the type some other way, so we can't use Expr if it is not sealed! Let's not forget that. We have different situation than subclassing with virtual methods.

So the other way to do that without sealed is to write the unnamed structural union type Num | Add<*, *> | Mul<*, *> every where we need to use the value (instead of writing Expr because Expr is no longer a class but is instead a trait bound as follows). In that union type declaration, * means any type in the containing union.

trait Expr { ... }
class Num { ... }
class Add<A:Expr, B:Expr> { ... }
class Mul<A:Expr, B:Expr> { ... }
impl Num for Expr { ... }
impl Add<A,B> for Expr { ... }
impl Mul<A,B> for Expr { ... }

So with the second form I think the inference is much easier?

keean commented 7 years ago

Yes, the second form is what I would want, although we need to decide on the keywords to use, and also whether to give the self type special status.

For example Rust uses struct where you have class, Haskell would use class where you have trait, data where you have class, and instance where you have impl.

Personally I think we should avoid class as Haskell and Java use it for completely different things. I quite like data.

keean commented 7 years ago

@shelby3 I have started a new topic to discuss function syntax, we should probably have other separate threads for class/data syntax and trait/impl syntax.

Discussion of function syntax: https://github.com/keean/zenscript/issues/6

shelby3 commented 7 years ago

@shelby3 wrote:

So the other way to do that without sealed is to write the unnamed structural union type Num | Add<*, *> | Mul<*, *> every where we need to use the value (instead of writing Expr because Expr is no longer a class but is instead a trait bound as follows). In that union type declaration, * means any type in the containing union.

So the other way to write that is with an alias so it can written in shorter form every where:

type ExprSet = Num | Add<*, *> | Mul<*, *>

I don't know we should allow the following?

type ExprSet = Num | Add<ExprSet, ExprSet> | Mul<ExprSet, ExprSet>

Is it confusing?

Edit: is our general rule that type parameters can only be bounded by typeclasses, not by data types?

keean commented 7 years ago

The way I have written these types is like this:

(Num | Add a a | Mul a a) as a

This is a recusive type, usable anywhere like this:

f(x :  (Num | Add a a | Mul a a) as a)

Or you can have a type alias which is just a short way of writing the same thing:

type exp = (Num | Add a a | Mul a a) as a
f(x : exp)

I think the general rule should be types are constrained by typeclasses, however I think set-subsumption is fine, so that we can see (Int | String) as being a set of types {Int, String} such that passing a String where a {Int, String} is okay.

shelby3 commented 7 years ago

@keean wrote:

(Num | Add a a | Mul a a) as a

It is more verbose, but it is more clear. I opt for readability (clarity) when the additional verbosity is minimal. So I agree with your idea.

If we use (Java/C++ angle < > or Scala square [ ]) brackets for type parameters any where, then we should use them every where for consistency. I use similar logic as for functions, to argue for adopting bracketed syntax.

I was actually confused for a few moments when I saw the a a. It isn't second nature for me. I don't code in Haskell.

I think the general rule should be types are constrained by typeclasses, ...

Not all types. We still need to pass data types around. A union of data types for example.

...however I think set-subsumption is fine, so that we can see (Int | String) as being a set of types {Int, String} such that passing a String where a {Int, String} is okay.

That doesn't apply to typeclasses as bounds on type parameters. Remember we decided upthread that unions of typeclasses don't make any sense because they aren't tagged at runtime.

Agreed for unions of data types.

keean commented 7 years ago

@shelby3 I have written a lot in both styles, my only objection to '<' and '>' is that they are not really brackets and don't often get bracket matching in UI, and cause problems with cut and paste into HTML. I think there are four choices:

Java style: Add<a, a> Scala style: Add[a, a] Haskell style: Add a a Prolog style: Add(a, a)

So which is the best option? I think we should move this sub-topic to a new discussion: https://github.com/keean/zenscript/issues/7

keean / zenscript

Union Types: Inference and Checking. #5