Function syntax for pattern matching

sylvanc commented 9 years ago

This is probably best illustrated with an example. Right now, we can write:

primitive Fibonacci
  fun apply(x: U64): U64 =>
    match x
    | 0 => 0
    | 1 => 1
    else Fibonacci(x - 1) + Fibonacci(x - 2)
    end

What we want to be able to write is:

primitive Fibonacci
  fun apply(0): U64 => 0
  fun apply(1): U64 => 1
  fun apply(x: U64): U64 => Fibonnaci(x - 1) + Fibonacci(x - 2)

As with all enhancement issues, comments are encouraged! @andymcn will be implementing this. So far it looks like we can do it with sugaring to a match expression, allowing us to keep a single entry in the vtable, to that at runtime, types don't have to be evaluated to do dispatch (which is important).

Praetonus commented 9 years ago

Looks great. Would it be possible to use captures and guards with this syntax ?

sylvanc commented 9 years ago

Hmm, good question. What would that look like?

Praetonus commented 9 years ago

Maybe something like this:

primitive Foo
  fun apply(x: U64 where x < 10): String => "Small"
  fun apply(x: U64): String => "Number"
  fun apply(s: String): String => s

sylvanc commented 9 years ago

That makes sense, and I think it works with multiple parameters as well. Then a parameter is a capture and a literal where a parameter should be is effectively a structural match.

Horrible example:

primitive Wombat
  fun apply(x: U64, y: String where x < 10): String => y + " (small)"
  fun apply(x: U64, _): String => x.string() + " (not small)"
  fun apply(s: String, _): String => s

@andymcn does that seem possible from an LL(1) parsing point of view?

andymcn commented 9 years ago

From a parsing point of view that's absolutely fine. The question is what will the variable names be?

We'll be implementing this as sugar, creating a single function with a match statement in it. The parameters of the generated function must have names, and those names must be different from the names given in the multi-function in the source, since those will be used as the captures. So what names do we use for the generated parameters?

We can easily generate arbitrary unique names. However, that means that auto generated docs, editor pop-up info, etc will have garbage names. (Obviously long term it would be good to change these things to know about multi-functions, but the point still stands for error messages, etc.)

Also, what if the names of different multi-function parameters are different? Example:

primitive Foo
  fun apply(x: U64, optional: Bool, hungry: Bool) => ...
  fun apply(x: U32, hungry: Bool, optional: Bool) => ...

That sure looks like a parameter order bug to me.

We could say that parameter names must match for all provided signatures for a given function and those names would be used for the captures. Then the parameter names of the generated function could be based on those, eg adding an extra prime to the end, or possibly a leading underscore (not allowed for names appearing in the source code, but OK in generated code).

The above horrible example then becomes:

primitive Wombat
  fun apply(x: U64, y: String where x < 10): String => y + " (small)"
  fun apply(x: U64, _): String => x.string() + " (not small)"
  fun apply(x: String, _): String => x

with generated function

  fun apply(x': (U64 | String), y': String): String ? => ...

andymcn commented 9 years ago

A few points have occured to me.

Would we require that all the cases for a given multifunction were consecutive or could they appear anywhere in the type?
```
primitive Wombat
fun apply(x: U64, y: String where x < 10): String => y + " (small)"
fun apply(x: U64, _): String => x.string() + " (not small)"
fun foo() => ...
fun apply(x: String, _): String => x
```
Is that last apply OK and part of the multifunction or an error?

From a technical point of view we can do either, it's just what is easiest to understand and keep bug free. I'm leaning towards all consecutive.
Determining the type of value parameters would be a problem. We don't know expression types until the expr pass and we need to know the function signature until way before that. We could delay creating the type at all until all the types are known, but that could easily lead to resolution loops that are impossible to resolve.

One easy option is say that we don't try to work out the types of values and they don't affect the multifunction type signature. The value types would then have to match the other cases, which I think is fine. Examples:
```
primitive Fibonacci
fun apply(0): U64 => 0
fun apply(1): U64 => 1
fun apply(x: U64): U64 => ...
```
The type of x would be U64 (given by the 3rd case) and the 0 and 1 would have to match this. Fine.
```
primitive Fibonacci
fun apply(0): U64 => 0
fun apply(1): U64 => 1
```
The type of the first argument would be Any and we'd have no idea what types the 0 and 1 would be, which would be an error. I think this is also fine.
```
primitive Fibonacci
fun apply(0): U64 => 0
fun apply(1): U64 => 1
fun apply(x: U64): U64 => ...
fun apply(x: U32): U32 => ...
```
The type of x would be (U64 | U32). We wouldn't be able to determine the types of 0 and 1, which would be an error. Again I think this is fine.
```
primitive Fibonacci
fun apply(U64(0)): U64 => 0
fun apply(U64(1)): U64 => 1
fun apply(x: U64): U64 => ...
fun apply(x: U32): U32 => ...
```
The type of x would again be (U64 | U32). Since the types of 0 and 1 are explicit there's no problem. Again fine.
Determining whether there are unhandled parameter type combinations, and hence whether the function has to be partial, would be a problem. Again we don't know the parameter types until way after we need to specify whether the function is partial.

Obviously if any case is partial then the whole function is, but when none are it's tricky.

One possibility is not to try to tell but just always make a multifunction partial. This would seem very annoying to me.

We could instead allow an else case, provided by the programmer. If not provided either we could provide a default that either does nothing (which is a problem if a return value is needed) or throws regardless of whether it's ever actually possible to reach it.

Possible syntax for this could be:
```
primitive Fibonacci
fun apply(0): U64 => 0
fun apply(1): U64 => 1
fun apply(x: U64): U64 => ...
fun else: U64 => 0
```
It's then up to this else handler whether is wants to raise an error, provide a default return value or do nothing.
The question of default bodies provided by traits and interfaces crops up. This could get very complicated. I think the sensible approach is to say that you cannot acquire function cases from an interface, only complete functions.

jemc commented 9 years ago

@andymcn - While reading your thoughts about naming the parameters of the multi-function, a thought occurred to me that I don't think you mentioned.

How would this look with passing named arguments when calling the multi-function? That is, Pony supports the traditional style of positional argument-passing, as well as passing arguments out-of-order by name using where inside the argument list. It seems like this provides some pitfalls for the multi-function requirements and implementation.

andymcn commented 9 years ago

Hmm, calling by name is a very good point. So we really want the parameter names to be as advertised for both calling the multi-function and for the captured variables with each function case.

The obvious way to do this is not to put everything into a single function. Instead each function case would be renamed to _name_$1 and a new name function would be added containing the match expression.

The standard example would then be converted to:

primitive Fibonacci
  fun apply(x: U64): U64 =>
    match x
    | 0 => _apply_$0(x)
    | 1 => _apply_$1(x)
    | var x':U64 => _apply_$2(x')
    end

  fun _apply_$0(x: U64): U64 => 0
  fun _apply_$1(x: U64): U64 => 1
  fun _apply_$2(x: U64): U64 => Fibonnaci(x - 1) + Fibonacci(x - 2)

andymcn commented 9 years ago

This is now done and pushed. However, it is currently a bit awkward to use as the resulting function always has None as a possible return type. This is a temporary hack until we get exhaustive pattern match checked in, which I'll hopefully be doing soon. I'll leave this issue open until that's done.

The whole question of naming is fixed by generating 2 methods:

A function or behaviour with name and parameter names that appear in the Pony source. This allows sensible passing args by name, etc. All this method does is call the second one.
A function with hygenic name and parameter names. This contains the match expression, which in turn contains the bodies of the methods from the Pony source.

This arrangement allows sensible names in the Pony source to just work in the generated code, without having to go through and rename things, including guards. It also prevents a case from accessing any parameters that it does not use, but that are named by other cases.

We had planned to have it so that if a combination of parameter types / values were passed in that weren't covered by any of the cases then we'd raise an error. This turned out to be problematic. Instead we just let that fall through the match to the default else clause, which just returns None. If you want an else then simply make the last case have all parameters are don't care.

darach commented 9 years ago

+1 Can't wait to give this a whirl!

darach commented 9 years ago

Close but no cigar:

primitive Fib
  fun fib(n: U64) : U64 =>
    match ( fib(n) , fib(n-1) )
    | ( let a : U64 , let b : U64 ) => a + b
    else None
    end
  fun fib(0) : U64 => 0
  fun fib(1) : U64 => 1

actor Main
  new create(env : Env) =>
    env.out.print("Fib(37) = " + Fib.fib(37).string())

This now compiles ok on master ( yay! ), but:

(lldb) target create "./fpfib1"
Current executable set to './fpfib1' (x86_64).
(lldb) run
Process 43518 launched: './fpfib1' (x86_64)
fpfib1 was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 43518 stopped
* thread #2: tid = 0x145bfc, 0x00000001000030bb fpfib1`Fib_$0$1(this=0x0000000100015bd8, $0$0=37) + 11 at main.pony:2, stop reason = EXC_BAD_ACCESS (code=2, address=0x700000106ff8)
    frame #0: 0x00000001000030bb fpfib1`Fib_$0$1(this=0x0000000100015bd8, $0$0=37) + 11 at main.pony:2 [opt]
   1    primitive Fib
-> 2      fun fib(n: U64) : U64 =>
   3        match ( Fib.fib(n) , Fib.fib(n-1) )
   4        | ( let a : U64 , let b : U64 ) => a + b
   5        else None
   6        end
   7      fun fib(0) : U64 => 0

I've tried a few other variants with the same effect

andymcn commented 9 years ago

You're running out of stack because you've written an infinite recursive loop. Well done.

Your program has 2 bugs.

The case functions are matched in the order they appear in the source, so the base cases must appear before the general case.
For fib(n) you're calling fib(n) and fib(n-1), rather than fib(n-1) and fib(n-2).

Try this:

primitive Fib
  fun fib(0): U64 => 0
  fun fib(1): U64 => 1
  fun fib(n: U64): U64 =>
    match(fib(n - 1), fib(n - 2))
    | (let a: U64, let b: U64) => a + b
    else None
    end

actor Main
  new create(env: Env) =>
    env.out.print("Fib(37) = " + Fib.fib(37).string())

darach commented 9 years ago

@andymcn Rofl! Apologies for the typo. One step closer!

darach commented 9 years ago

@andymcn Oh, and the interesting thing to me in getting that to compile was avoiding the add overload / None issues. This makes the FP syntax very useful now, and when exhaustive search is done, the code can be cleaned up to be more idiomatic...

darach commented 9 years ago

In the BNF, the where guard clause in an FP style fun is inside the closing parenthesis:

primitive Functional
  fun gt(x: U64, y: U64 where x > y) : Bool => true
  fun gt(_, _) : Bool => false

I think it would be more readable if the guard was outside of the parenthesis:

primitive Functional
  fun gt(x: U64, y: U64) where x > y : Bool => true // Bad syntax
  fun gt(_, _) : Bool => false

kamilchm commented 9 years ago

:+1: for readability with guards outside of the parenthesis

jemc commented 9 years ago

I actually disagree :stuck_out_tongue: because it would separate the return type : Bool further from the parens.

darach commented 9 years ago

@jemc How about after the return but before the =>?

primitive Functional
  fun gt(x: U64, y: U64) : Bool where x > y ; ... => true // Bad syntax
  fun gt(_, _) : Bool => false

But the guard is now further from the parameters... Ho hum!

andymcn commented 9 years ago

I agree that having the guard within the parameter parentheses is rather ugly.

Simply moving it to after the close paren makes the parsing ambiguous, due to the optional : type and ? clashing with optional types on local variable declarations and error indicators on FFI calls respectively. Of course we don't really want either of those in a case method guard, but that's the downside of "everything is an expression"; when you ask for an expression you can get anything :)

To fix this we'd have to indicate the end of the guard, eg with an end:

fun foo(x: U64, y: U64) where x > y end : Bool => ...

or by putting the guard in parens:

fun foo(x: U64, y: U64) where (x > y) : Bool => true

Moving the guard after the return and error indicators, just before the => would work fine:

fun foo(x: U64, y: U64): Bool where x > y => true

We could of course use a completely different syntax for case methods if we wanted to, they don't have to look like regular methods at all.

darach commented 9 years ago

Would be nice to avoid a different syntax for case methods I think as it allows for a more natural refactoring cadence moving between case and 'ordinary' methods...

Any of these options would work for me. I wonder what @jemc thinks though! ;)

jemc commented 9 years ago

Since you asked :wink:, I think I still prefer it as inside the parenthesis. In this syntax, the guard is closest to the parameters it is guarding, and is quite similar/consistent with the syntax for calling with named arguments using where.

That said, my feelings toward it vs the other options aren't that strong - though I do think that requiring an extra end to finish the guard is rather ugly and misleading - it seems like it might be ending the entire declaration.

darach commented 9 years ago

@jemc Good point. I'm thinking with an Erlang syntax on the brain so after feels more natural to me. Ho hum!

andymcn commented 9 years ago

Interesting. I'd say that the comparison with passing arguments by name is a reason to make case methods different, since this is a totally different use of the where keyword.

Instead I'd compare to matching on a tuple with a guard, which, in my mind at least, is very similar to what case methods with multiple parameters and a guard are doing.

The match guard syntax for a tuple is:

| (let x: U64, let y: U32) where x > 4 => ...

Of course that doesn't have a return value and error indicator in it.

jemc commented 9 years ago

In that case, I think your last example was best:

fun foo(x: U64, y: U64): Bool where x > y => true

darach commented 9 years ago

+1 :)

andymcn commented 9 years ago

This has got me thinking about potential inconsistencies with guards in general. I've opened a new issue (#406) for that. Opinions on that welcome, especially if they affect this issue.

sylvanc commented 9 years ago

fun foo(x: U64, y: U64): Bool where x > y => true

+1 :)

sylvanc commented 9 years ago

In fact, based on #406, it can be:

fun foo(x: U64, y: U64): Bool if x > y => true

Which I think is even nicer.

andymcn commented 9 years ago

The guard syntax is now changed to @sylvanc's example above.

sylvanc commented 8 years ago

While we still need exhaustive match, that's a separate issue. Closing this one.

ponylang / ponyc

Function syntax for pattern matching #351