Functions with constraints on type parameters of containing type.

jemc commented 8 years ago

We currently have type parameters on types (class, actor, etc), which allow passing in type parameters when referring to that type (for example, Array[String]). We also have type parameters on functions, which allow passing in type parameters when calling the function (for example, h.assert_equal[String]("foo", "foo")). Both of these involve the compiler generating separate LLVM functions for each type combination encountered at call sites.

Type parameters can also be constrained. For example, h.assert_equal constrains its type parameter so that the given type must be both Equatable for checking equality, and Stringable for printing. If the given type doesn't meet the constraints, it will generate a compiler error.

When working on generics, I've often wished for another related feature: making the function conditional on type parameters of the containing type.

For example, we don't have an Array.string function because not everything you might want to put in an Array is Stringable - that is, we might be able to implement Array[U8].string() or Array[String].string(), but we can't implement Array[MyClass] if MyClass is not Stringable. Currently, to accomplish this we would need to do one of two things:

Constrain class Array[A] to class Array[A: Stringable]. However, this would mean we could no longer even create an Array[MyClass] if MyClass is not Stringable. So this doesn't really work for us.
Make the Array.string() method do a type conversion with as, and error if the type A is not Stringable. However, this is still a bit silly - our compiler can know at compile time whether A is Stringable, so why defer this to runtime? Why even allow us to call Array[MyClass].string() if it will never work, and why make us surround Array[String].string() with a try ... end block if it will always work?

My proposal would be:

Allow functions that will compile only when additional constraints on the outer type is met. For example, when compiling Array[A], the string method would be just ignored and left out if A was not Stringable.

I'm open to ideas about syntax, but my current thought is this - since we already can't shadow a type parameter name (for example, inside class Array[A] we cannot currently define fun string[A]()), we can do this by "overloading" the existing constraint syntax, referencing the outer existing type parameter name, and constraining it further (for example, inside class Array[A] we might define fun string[A: Stringable]()).

sylvanc commented 8 years ago

This is something I've been thinking about for a while. I think adding constraints is definitely interesting. I think it's also possible we could have a "type language" for computing a type for some instantiation, for example:

type Map[K, V, H: HashFunction[K] val = F] is HashMap[K, V, H] where
  F = match K
  | (Hashable #read & Equatable[K] #read) => HashEq[K]
  else
    HashIs[K]
  end

I think in the specific example of turning an Array[A] into a String, there are three options (these are all not the actual code you'd want for a string representation of an array, just examples to give an idea of the approach):

fun string[A: Stringable](): String iso^ =>
  """
  Use a type constraint at compile time.
  """
  let s = recover String end
  s.push("[")
  for v in values() do
    let s' = v.string()
    s.push(s')
  end
  s.push("]")
  s

fun string(): String iso^ =>
  """
  Use a type constraint at runtime.
  """
  let s = recover String end
  s.push("[")
  if A <: Stringable then // requires new syntax here
    for v in values() do
      let s' = v.string()
      s.push(s')
    end
  else
    s.push("unknown")
  end
  s.push("]")
  s

fun string(f: {(this->A!): String} box): String iso^ =>
  """
  Use a lambda.
  """
  let s = recover String end
  s.push("[")
  for v in values() do
    let s' = f(v)
    s.push(s')
  end
  s.push("]")
  s

tmmcguire commented 8 years ago

Is this anything like the List.contains (https://github.com/ponylang/ponyc/blob/master/packages/collections/list.pony#L491-L499) type parameters (which I've been unable to duplicate for an array element comparison lambda),

fun contains[B: (A & HasEq[A!] #read) = A](a: box->B): Bool =>
    """
    Returns true if the list contains the provided element, false otherwise.
    """
    try
      _contains[B](head(), a)
    else
      false
    end

jemc commented 8 years ago

As discussed in a call today, we need some syntax to use in conditionals for constraining that A is a subtype of B. When we have this, we can apply it to a function signature (similar to existing case methods) to conditionally compile a function, or in function bodies as an expression with a Bool result that is computed at compile time.

One obvious approach might be to use <: as a binary operator to denote that this relationship should be checked. So, for example:

if A <: B then
  // ...
else
  // ...
end

However, due to Pony's LL1 parser, we have to know when we hit the A token that we mean it as a type and not as an expression (translating to A.create()). We don't actually know this until we see (or don't see) the <: token, which means it cannot be parsed as LL1.

So, we need a syntax with a token that precedes A to let the parser know that one of these type constraint expressions is on the way. @sylvanc has proposed:

if |- A <: B then
  // ...
else
  // ...
end

but such an approach might be too cryptic.

Does anyone have any other syntax ideas that meet these needs?

tmmcguire commented 8 years ago

I, for one, would like to go on record as supporting the turnstile syntax option, but only if you require the unicode RIGHT TACK character (⊢). Every language should have at least one un-typable, impossible to find operator glyph. And it would put Pony ahead of Perl. (I think.)

jemc commented 8 years ago

going with the pony style of keywords that are words smushed together we could try something like this:

if checktype A <: B then
  // ...
else
  // ...
end

ryanai3 commented 8 years ago

@jemc Why is TypeName expanded to TypeName.create()? I'd argue that A should always refer to the type A, and A() should expand to A.create() instead. Expanding it to the creation method on the type by default is ambiguous and inconsistent in my view. Take:

interface SomeContainer[A]
  new create() => //...

Should we then take any instances of the string SomeContainer as SomeContainer.create()? If we want to expand it, what do we fill in the typearg with?

I think that it's a lot less ambiguous to just have the typename represent the type, and it's the way most languages handle it. If that becomes the case, does that remove the need for a preceding token?

As much as I like lambdas, I think there definitely needs to be a simple syntax for the vast majority of use cases, checktype A <: B or type language (+ 1 from me!) aside. I like the : fun string[A: Stringable] (): String syntax when A is already defined in the surrounding scope. Up until now, the problem seems to have been semi-solved with primitives (list.contains), and you can solve it with lambdas and value types when those come around, but it gets complex fast, and compared to the [A: Stringable] syntax means for a lot of pain making sure everything typechecks and the primitive default is sensible.

// the lambda w/ value type solution:
fun my_fun[conv: {(A): String}](): String // the functional notion of providing a 'proof' of convertibility

jemc commented 8 years ago

@jemc Why is TypeName expanded to TypeName.create()? I'd argue that A should always refer to the type A, and A() should expand to A.create() instead. Expanding it to the creation method on the type by default is ambiguous and inconsistent in my view. Take:
interface SomeContainer[A]
  new create() => //...
Should we then take any instances of the string SomeContainer as SomeContainer.create()? If we want to expand it, what do we fill in the typearg with?

It's all about context. The Pony parser is a LL(1) parser that consumes a stream of lexeme tokens from a lexer. So, expanding on your example to include some more examples of type name interpretation, the tokens would look something like this:

interface SomeContainer[A]
  new create(): Bar =>
    let a: Bar = Bar
    a

[TK_INTERFACE, TK_ID("SomeContainer"), TK_LSQUARE, TK_ID("A"), TK_RSQUARE,
 TK_NEW, TK_ID("create"), TK_LPAREN, TK_RPAREN, TK_COLON, TK_ID("Bar"), TK_DBLARROW,
 TK_LET, TK_ID("a"), TK_COLON, TK_ID("Bar"), TK_EQ, TK_ID("Bar"),
 TK_ID("a"), ...]

The LL(1) part means that it only has 1 token of lookahead. So, when it sees the TK_ID("SomeContainer") it must make a decision about whether to think about this as a type or as an expression (with implicit create() or apply(), or any explicit method call that follows). However, even an explicit method call is unknown to the parser at this point, because that would require more lookahead so it must make a decision based purely on the tokens it has already seen.

In this case, the preceding TK_INTERFACE puts the parser into a context where it is expecting a type name, not an expression that evaluates to a value. Similarly, when the parser hits TK_ID("Bar") the first time, it knows from the preceding TK_COLON to expect a type (for the return value) instead of an expression with a value, so it is converted to a TK_NOMINAL AST item. In the method body (following the TK_DBLARROW), the TK_COLON after the TK_LET again lets the parser know that a type name is coming, so it is again converted to a TK_NOMINAL AST item. However, following the TK_EQ an expression is expected, meaning that TK_ID("Bar") must be evaluated as a TK_REFERENCE, not a TK_NOMINAL. Later, during the AST passes that further transform the AST, a bare TK_REFERENCE to a type name (capitalized) will be get the implicit method call added to either create or apply if no such call yet exists. But even if an explicit call is attached, it can only be attached to a TK_REFERENCE, not a TK_NOMINAL, so these two must be distinguished ahead of time by the LL(1) parser.

So your suggestion of requiring all create and apply calls to be explicit doesn't actually help distinguish between TK_NOMINAL and TK_REFERENCE, based on how the parser and compiler work. There are other ways that the parser and compiler could work, but this is the way that @sylvanc and the other creators of Pony want them to work (and for good other reasons) so we'll have to make the language work within these design constraints.

ryanai3 commented 8 years ago

Makes sense, thanks.

Can we get around the parsing issue by turning <: into an infix operator macro? That seems to be the way julia and scala handle it. Or do infix operators ( like +) work because the parser's already determined it's a TK_REFERENCE?

Otherwise, I think checktype is better than a symbol for the more complex case, but I still think we should have a syntax for simple constraints on methods:

method_with_constraint[A: TypeConstraint]() => // where A defined in surrounding scope

How far do you think this should go in terms of constraining non-concrete types? What about conditional methods in interfaces? i.e.:

interface SomeContainer[A: Any val]
  fun box sort[A: Comparable[A](): SomeContainer[A]
  fun box clone(): SomeContainer[A]

class Impl1[A]
  fun box clone(): Impl1[A] //... implementation

class Impl2[A]
  fun box clone(): Impl2[A] //... implementation
  fun box sort[A: Comparable[A]](): Impl2[A]
// I think I handled the contra/co-variance here such that the following makes sense:

Then: Impl1[UnComparable] <: SomeContainer[UnComparable] Impl2[UnComparable] <: SomeContainer[UnComparable] Impl2[Comparable] <: SomeContainer[Comparable] but: Impl1[Comparable] !<: SomeContainer[Comparable]

I'd argue that we should be able to constrain types in interfaces as well - if my interface exposes certain methods when the typearg obeys certain constraints, classes trying to implement it should be fine without those methods when the typearg doesn't obey those constraints, but should need to expose those methods to properly be a subtype of the interface when the typearg does obey the constraints. I'd think it could be very useful.

How about conditional compilation based on value type? i.e. What if your implementation of some method differs based on some value typearg to the class/interface? fun box some_k[n: Stringable] // where n is value typearg, but not constrained to stringable in surrounding scope

Does this start to become Macros-lite if you've got a bunch of if's conditionally inserting code all over the place, and therefore does it make sense to consider this in the complex case (beyond simply constraining a simple method) as a part of a broader macro system?

jemc commented 8 years ago

I think there are some good questions here, but I'll leave them for @sylvanc to ponder because he has a clearer vision of where he wants the type system to go.

SeanTAllen commented 8 years ago

@sylvanc any further thoughts on this?

SeanTAllen commented 8 years ago

@jemc are you planning on turning this into a RFC? if not, I'll add a request for RFC. I'm trying to close out all the "turn into rfc" tagged items.

jemc commented 8 years ago

Closed in favor of https://github.com/ponylang/rfcs/issues/60

ponylang / ponyc

Functions with constraints on type parameters of containing type. #683