koka-lang / koka

Koka language compiler and interpreter
http://koka-lang.org
Other
3.16k stars 151 forks source link

Structural Types & Newtype #534

Open TimWhiting opened 1 month ago

TimWhiting commented 1 month ago

It would be nice to have unnamed extensible structural types, and unnamed extensible sum types. Of course these should be canonicalized in the backend (if needed).

Maps do not fit this because the type of the values must be the same, but I'm also not talking about maps in other languages which could have a dynamic number of keys. I'm just talking about non-nominal canonical types.

Additionally a newtype annotation would be very welcome, alias doesn't cover some situations that would be nice (e.g. more refinely typed strings and numbers - to give added discipline on top of an existing type & api).

The current type & struct annotations could then be desugared to a combination of the structural types and a newtype annotation. And by adding implicit methods we could extract the structural type from the type and use that to implement interfaces.

Of course this introduces a form of subtyping, which might not be desired as it would make the type system and inference more complex. (Unless we make all subtyping coercion explicit with implicits)

i.e.

fun get-x(value: a, ?coerce: (a -> {x: string, y: int})): string
  match value.coerce
    {x: s} -> s

If ordering of fields are canonical than it is easy to extend a type in different ways:

fun either-default(value: either<a,b>, ?a/coerce: (a -> {x: string}), ?b/coerce: (b -> {y: int})): {x: string, y: int}
  match value
    Left(a) -> {...a.coerce(), y: 0}
    Right(b) -> {x: "", ...b.coerce()}

Importantly we still require equality instead of subtyping, if one branch returns a unnamed struct with fewer or different fields than another, then it will give an error at inference, stating which fields are missing / different. Aliases can be used of course to simplify the types, but don't make them different from other canonically identical types.

chtenb commented 1 month ago

Instead of subtyping, it might also make sense to use row polymorphism, similar to how this is used for effects.

I known that PureScript also adopts this for record types: https://github.com/purescript/documentation/blob/master/language/Types.md#rows https://github.com/purescript/documentation/blob/master/language/Records.md

TimWhiting commented 1 month ago

I believe Roc also uses some kind of Row polymorphism, but my understanding is that there is still a sort of subtyping between rows? I do like how purescript uses parentheses for their rows. It looks like a named tuple essentially, which is what it is other than the fact that it is not really ordered.

Anyways, I think there are good strategies for compilation and typing nowadays for row types including both open and closed row types. Nominal types could just be new type definitions for closed rows.

Anonymous structural unions are less common but notably also found in Roc, which was incidentally inspired by Koka's reference counting.

chtenb commented 1 month ago

but my understanding is that there is still a sort of subtyping between rows

I'm not well versed in type theory at all and also not familiar with the intricacies of Roc, but I suspected that it could possibly work in the same manner as koka uses a row for extensible effects like in https://koka-lang.github.io/koka/doc/book.html#sec-polymorphic-effects Instead of subtyping it involves a unification step, if myu understanding is correct. It would make sense to me if that same mechanism would also be used in other places, as a principle of least surprise. Once a Koka programmer is familiar with the concept and how the compiler behaves w.r.t. type unification (which is not necessarily intuitive for all programmers) this skill would carry over to all extensible types, not just for effects.

TimWhiting commented 1 month ago

There's a good paper about unifying different row theories. The main difference I would want from a structural type instead of effects would be no duplicate labels.

Abstracting Extensible Data Types Or, Rows by Any Other Name is the paper's title. I think there is also a more recent one by those authors.

chtenb commented 1 month ago

Yeah, good point. I hadn't thought about the fact that effects support duplicate labels, and that's indeed not what you would want from structural types.

TimWhiting commented 1 month ago

I think it is worth noting that when we lift closures to first class functions to compile to C we could desugar closures environments to structural types and benefit again from the canonicalization of structural types (as well as being able to participate in koka's reference counting and reuse analysis).