links-lang / links

Links: Linking Theory to Practice for the Web
http://www.links-lang.org
Other
329 stars 42 forks source link

Proposal: SML-style function definition groups #869

Closed dhil closed 4 years ago

dhil commented 4 years ago

I'd like to propose augmenting Links with SML-style function definition groups. This proposal is entirely about improving the quality of life for Links programmers.

What: Function definition group

In functional languages such as SML and Haskell the definition of function can be given be a group of partial definitions, technically known as a function definition group. For example, in SML the following three definitions form a function definition group which defines the Ackermann function:

fun ack (0, n) = n + 1
  | ack (m, 0) = ack (m - 1, 1)
  | ack (m, n) = ack (m - 1, ack (m, n - 1))

Each line provides a partial definition of the function. The first definition of the group is prefixed by fun and subsequently definitions are prefixed by |. When calling ack at runtime the temporal order of definitions is used to determine which definition to execute.

Motivation

Function definition groups provide a form of declarative programming that can aid readability and writability of programs by liberating the programmer from performing control flow analysis. For example consider the ack function above in implemented in Links

fun ack(m, n) {
  if (m == 0) n + 1
  else if (n == 0) ack(m - 1, 1)
  else ack(m - 1, ack(m, n - 1))
}

The concrete shape of the control flow is an implementation detail, here is an equivalent implementation with a slightly more involved control flow

fun ack'(m, n) {
  if (m <> 0 && n == 0) ack'(m - 1, 1)
  else if (m <> 0 && n <> 0) ack'(m - 1, ack'(m, n - 1))
  else n + 1
}

The point is the concrete control flow is not illuminating. By using function definition groups a clever compiler can potentially pick the optimal implementation for the programmer.

In Links, one could make use of a switch-expression to implement the ack function

fun ack''(m, n) {
  switch ((m, n)) {
    case (0, n) -> n + 1
    case (m, 0) -> ack''(m - 1, 1)
    case (m, n) -> ack''(m - 1, ack''(m, n - 1))
  }
}

Arguably this implementation is more readable than the previous implementation ack', and it has a declarative flavour closer to that of function definition groups. Though, there is one caveat. This particular implementation is potentially less efficient as it requires the "arguments" to be boxed. A clever compiler may be able to figure out that this boxing is unnecessary.

Concrete syntax

I propose to adopt a syntax similar to the syntax used by SML, i.e. the first definition of a definition group is prefixed by fun and subsequent definitions are prefixed by |. Concretely, the ack function would be implemented as

fun ack(0, n)   { n + 1 }
     | (m, 0) { ack(m - 1, 1) }
     | (m, n) { ack(m - 1, ack(m, n - 1)) }

Semantics

In order to decide which definition to execute the runtime matches the formal parameter patterns of each definition against the list of arguments in the lexical order of definitions. If multiple matches are found, only the body of the first matched definition should be executed. If no match is found then a runtime error should be raised.

Implementation work

A full implementation would potentially touch many parts of the code base: parser, type checking, pattern matching compiler, and IR compiler. This is potentially a lot of work for a feature that has no research content associated with it. Thus I suggest, at least initially, to opt for a more lightweight implementation based on desugaring. We can implement a function definition group by desugaring it into a regular function definition whose body consists of a switch-expression, where each definition corresponds to a case in the switch-expression. Concretely, I have in mind that we can desugar them prior to type checking using a translation such as

  [| fun f(ps0*) { m0 } | (ps1*) { m1 } | ... | (psN) { mN } |] 
= 
  fun f(ps) { 
    switch (box(ps)) {
      case ps0 -> [|m0|]
      case ps1 -> [|m1|]
      ...
      case psN -> [|mN|]
      case _ -> error("No match")
    }
  }

The function box is intended to denote the standard Links tupling construct.

Other remarks

I think this project could make a good initial internship or student project.

jamescheney commented 4 years ago

I see no reason to consider a heavyweight alternative to desugaring, it would be easy to do as an untyped preprocessing step but might yield better messages if incorporated into type inference.

It might be worthwhile to consider a Scala-style alternative such as an infix version of "switch" - e.g. you could have

fun ack(n,m) { (n,m) match {
  case (0, n) ->  n + 1
  case (m, 0) -> ack(m - 1, 1)
  case (m,n) ->  ack(m - 1, ack(m, n - 1)) 
  }
}

or perhaps even

fun ack(_,_) match {
  | (0, n) ->  n + 1
  | (m, 0) -> ack(m - 1, 1)
  | (m,n) ->  ack(m - 1, ack(m, n - 1)) 
}

Note that any variables in the initial fun parameter list would be in scope until shadowed, but writing _ for them avoids this if not needed. Also note that this allows defining anonymous functions by pattern matching (like OCaml's function) which SML does not allow.

On the other hand, SML does support defining curried functions by pattern matching IIRC, which is not considered in the above proposal, something like this:

fun ack(_)(_) match {
  | 0 n ->  n + 1
  | m 0 -> ack(m - 1)(1)
  | m n ->  ack(m - 1)(ack(m)(n - 1)) 
}

but I would not go this far since it gets messy (and is a bit ugly if we follow Links's parentheses conventions).

But I'm sure there are even more bikesheddy alternatives.

I think @emanon42 and @SimonJF had had in mind to look at improving the pattern matching behavior of receive and perhaps this is close enough to also be looked into at the same time.

SimonJF commented 4 years ago

@Emanon42 is going to have a go at this.

dhil commented 4 years ago

@jamescheney

I see no reason to consider a heavyweight alternative to desugaring, it would be easy to do as an untyped preprocessing step but might yield better messages if incorporated into type inference.

I have a strong preference for eventual having deep support for most features, exactly for the reasons of better error messages, better pattern matching support, efficiency, etc (anecdotally I suspect SML/NJ desugars loads of their features as error messages are often arcane). Nevertheless, I think desugaring is fine as an initial implementation, and then later we can decide later it is worth to invest more time in a "proper" implementation.

It might be worthwhile to consider a Scala-style alternative such as an infix version of "switch" - e.g. you could have [...]

Agreed. To stay in line with the current syntax of Links, I think the "natural" choice would be something along the lines of

fun ack(_,_) switch {
  case (0, n) -> n + 1
  case (m, 0) -> ack(m - 1, 1)
  case (m, n) -> ack(m - 1, ack(m, n - 1))
}

However, I have a slight preference for the syntax I proposed in my initial post, as it is more succinct -- although I realise that is not necessarily in the spirit of the current syntax of Links.

On the other hand, SML does support defining curried functions by pattern matching IIRC, which is not considered in the above proposal [...]

I have actually thought about curried functions as well, but I forgot to mention them. Thanks! I think it would be rather straightforward to support curried functions as our internal representation of functions uses a list of list structure to represent arguments, each list denoting the parameters of a function. I think my proposed macro-expansion can be modified as follows to cover curried functions:

  [| fun f(ps0*)...(ps00*) { m0 } | ... | (psN*)...(psNN*) { mN } |] 
= 
  fun f(qs)...(qsN) { 
    switch (box(qs,...,qsN)) {
      case (ps0,...,ps00) -> [|m0|]
      ...
      case (psN,...,psNN) -> [|mN|]
      case _ -> error("No match")
    }
  }

@SimonJF

@Emanon42 is going to have a go at this.

Sounds good to me. I am happy to help however I can.

Emanon42 commented 4 years ago

After talking with @SimonJF , I have some rough idea about this proposal. Depends on different syntax style, implementation could happen in different phases: If we use a style like this:

fun ack(_,_) switch {
  case (0, n) -> n + 1
  case (m, 0) -> ack(m - 1, 1)
  case (m, n) -> ack(m - 1, ack(m, n - 1))
}

Then I think it would only involve the desugaring pass. It is not necessary to introduce the new AST node here (I might be wrong). But if we want to use guard style like this:

fun ack(_)(_) match {
  | 0 n ->  n + 1
  | m 0 -> ack(m - 1)(1)
  | m n ->  ack(m - 1)(ack(m)(n - 1)) 
}

Then we could probably need to introduce extra nodes in AST. By the way, where can I find some materials about the practice of implementing features of functional programming language (like guard or pattern matching here)? Should I parse it as three different function bodies with different arguments pattern or one function body with nested if? Also how can I proceed with exhaustive checking?

dhil commented 4 years ago

Regardless of the concrete syntax, I think you should add a new node to the (frontend) AST for function group definitions. This node may be obsolete after untyped desugaring, which is fine.

dhil commented 4 years ago

@Emanon42

Should I parse it as three different function bodies with different arguments pattern or one function body with nested if? Also how can I proceed with exhaustive checking?

I suggest first focusing on the abstract syntax, i.e. how to represent function group definitions internally in the compiler. Specifically you should add a node for function group definitions to sugartypes.ml. I suppose it is worth looking node Fun of function_definition in sugartypes.ml and understand how it represents functions.

jamescheney commented 4 years ago

I think adding a new AST node is a reasonable plan, but there will be a lot of overlap with Fun (and Funs, the mutually recursive version, and Funlit, the expression form for anonymous functions) so it seems like this strategy may lead to a lot of code duplication.

I wonder whether the right thing would actually be to:

... or maybe we should abstract the second component of funlit to funbody with cases PhraseBody and SwitchBody, since both cases of my proposed alteration of funlit have the first argument in common.

dhil commented 4 years ago

I think a refactoring along the lines you outline @jamescheney makes sense -- ultimately. Though, I would first focus on getting the function groups implemented which may very well require some code duplication (or at the very least similar code). I think it would be instructive to understand how the frontend pipeline functions before doing a larger refactoring, especially if this is the first time @Emanon42 is working on a functional compiler. It may be a little more work, but I think the experience will be better.

jamescheney commented 4 years ago

That makes sense (though happy for @emanOn and @SimonJF to work out a detailed plan).

Perhaps a minimal useful initial step would be to add the appropriate cases to the grammar, and directly desugar to existing AST cases first, before adding AST cases for fun switch.

Emanon42 commented 4 years ago

OK, I go through the frontline.ml and (hopefully) understand how does Links frontend pipeline work. I have added new switchfun node in sugartypes.ml and changed the definition of funlit to a wrapper for either switchfun or normalfun. I am trying to make it compiling and adding traversing code (my language server can't peek reference for all usage here, I have to make and check error message to fix) Another question: I thought parser.ml would emit the sugared AST which are described in sugartypes.ml. Because I am adding new nodes to it, should I change the parser.mly as well?

dhil commented 4 years ago

Yes, you will need to augment parser.mly eventually with new production rules for parsing the source language syntax for function group definitions.

Emanon42 commented 4 years ago

908