Closed dhil closed 4 years ago
I see no reason to consider a heavyweight alternative to desugaring, it would be easy to do as an untyped preprocessing step but might yield better messages if incorporated into type inference.
It might be worthwhile to consider a Scala-style alternative such as an infix version of "switch" - e.g. you could have
fun ack(n,m) { (n,m) match {
case (0, n) -> n + 1
case (m, 0) -> ack(m - 1, 1)
case (m,n) -> ack(m - 1, ack(m, n - 1))
}
}
or perhaps even
fun ack(_,_) match {
| (0, n) -> n + 1
| (m, 0) -> ack(m - 1, 1)
| (m,n) -> ack(m - 1, ack(m, n - 1))
}
Note that any variables in the initial fun parameter list would be in scope until shadowed, but writing _ for them avoids this if not needed. Also note that this allows defining anonymous functions by pattern matching (like OCaml's function
) which SML does not allow.
On the other hand, SML does support defining curried functions by pattern matching IIRC, which is not considered in the above proposal, something like this:
fun ack(_)(_) match {
| 0 n -> n + 1
| m 0 -> ack(m - 1)(1)
| m n -> ack(m - 1)(ack(m)(n - 1))
}
but I would not go this far since it gets messy (and is a bit ugly if we follow Links's parentheses conventions).
But I'm sure there are even more bikesheddy alternatives.
I think @emanon42 and @SimonJF had had in mind to look at improving the pattern matching behavior of receive and perhaps this is close enough to also be looked into at the same time.
@Emanon42 is going to have a go at this.
@jamescheney
I see no reason to consider a heavyweight alternative to desugaring, it would be easy to do as an untyped preprocessing step but might yield better messages if incorporated into type inference.
I have a strong preference for eventual having deep support for most features, exactly for the reasons of better error messages, better pattern matching support, efficiency, etc (anecdotally I suspect SML/NJ desugars loads of their features as error messages are often arcane). Nevertheless, I think desugaring is fine as an initial implementation, and then later we can decide later it is worth to invest more time in a "proper" implementation.
It might be worthwhile to consider a Scala-style alternative such as an infix version of "switch" - e.g. you could have [...]
Agreed. To stay in line with the current syntax of Links, I think the "natural" choice would be something along the lines of
fun ack(_,_) switch {
case (0, n) -> n + 1
case (m, 0) -> ack(m - 1, 1)
case (m, n) -> ack(m - 1, ack(m, n - 1))
}
However, I have a slight preference for the syntax I proposed in my initial post, as it is more succinct -- although I realise that is not necessarily in the spirit of the current syntax of Links.
On the other hand, SML does support defining curried functions by pattern matching IIRC, which is not considered in the above proposal [...]
I have actually thought about curried functions as well, but I forgot to mention them. Thanks! I think it would be rather straightforward to support curried functions as our internal representation of functions uses a list of list structure to represent arguments, each list denoting the parameters of a function. I think my proposed macro-expansion can be modified as follows to cover curried functions:
[| fun f(ps0*)...(ps00*) { m0 } | ... | (psN*)...(psNN*) { mN } |]
=
fun f(qs)...(qsN) {
switch (box(qs,...,qsN)) {
case (ps0,...,ps00) -> [|m0|]
...
case (psN,...,psNN) -> [|mN|]
case _ -> error("No match")
}
}
@SimonJF
@Emanon42 is going to have a go at this.
Sounds good to me. I am happy to help however I can.
After talking with @SimonJF , I have some rough idea about this proposal. Depends on different syntax style, implementation could happen in different phases: If we use a style like this:
fun ack(_,_) switch {
case (0, n) -> n + 1
case (m, 0) -> ack(m - 1, 1)
case (m, n) -> ack(m - 1, ack(m, n - 1))
}
Then I think it would only involve the desugaring pass. It is not necessary to introduce the new AST node here (I might be wrong). But if we want to use guard style like this:
fun ack(_)(_) match {
| 0 n -> n + 1
| m 0 -> ack(m - 1)(1)
| m n -> ack(m - 1)(ack(m)(n - 1))
}
Then we could probably need to introduce extra nodes in AST. By the way, where can I find some materials about the practice of implementing features of functional programming language (like guard or pattern matching here)? Should I parse it as three different function bodies with different arguments pattern or one function body with nested if? Also how can I proceed with exhaustive checking?
Regardless of the concrete syntax, I think you should add a new node to the (frontend) AST for function group definitions. This node may be obsolete after untyped desugaring, which is fine.
@Emanon42
Should I parse it as three different function bodies with different arguments pattern or one function body with nested if? Also how can I proceed with exhaustive checking?
I suggest first focusing on the abstract syntax, i.e. how to represent function group definitions internally in the compiler. Specifically you should add a node for function group definitions to sugartypes.ml
. I suppose it is worth looking node Fun of function_definition
in sugartypes.ml
and understand how it represents functions.
I think adding a new AST node is a reasonable plan, but there will be a lot of overlap with Fun
(and Funs
, the mutually recursive version, and Funlit
, the expression form for anonymous functions) so it seems like this strategy may lead to a lot of code duplication.
I wonder whether the right thing would actually be to:
switch_body = (Pattern.with_pos * phrase) list
Switch
nodes to phrase * switch_body * Types.datatype option
funlit
to a datatype with cases:
NormalFun of Pattern.with_pos list list * phrase
SwitchFun of Pattern.with_pos list list * switch_body
... or maybe we should abstract the second component of funlit
to funbody
with cases PhraseBody
and SwitchBody
, since both cases of my proposed alteration of funlit
have the first argument in common.
I think a refactoring along the lines you outline @jamescheney makes sense -- ultimately. Though, I would first focus on getting the function groups implemented which may very well require some code duplication (or at the very least similar code). I think it would be instructive to understand how the frontend pipeline functions before doing a larger refactoring, especially if this is the first time @Emanon42 is working on a functional compiler. It may be a little more work, but I think the experience will be better.
That makes sense (though happy for @emanOn and @SimonJF to work out a detailed plan).
Perhaps a minimal useful initial step would be to add the appropriate cases to the grammar, and directly desugar to existing AST cases first, before adding AST cases for fun switch
.
OK, I go through the frontline.ml
and (hopefully) understand how does Links frontend pipeline work. I have added new switchfun
node in sugartypes.ml
and changed the definition of funlit
to a wrapper for either switchfun
or normalfun
.
I am trying to make it compiling and adding traversing code (my language server can't peek reference for all usage here, I have to make
and check error message to fix)
Another question: I thought parser.ml
would emit the sugared AST which are described in sugartypes.ml
. Because I am adding new nodes to it, should I change the parser.mly
as well?
Yes, you will need to augment parser.mly
eventually with new production rules for parsing the source language syntax for function group definitions.
I'd like to propose augmenting Links with SML-style function definition groups. This proposal is entirely about improving the quality of life for Links programmers.
What: Function definition group
In functional languages such as SML and Haskell the definition of function can be given be a group of partial definitions, technically known as a function definition group. For example, in SML the following three definitions form a function definition group which defines the Ackermann function:
Each line provides a partial definition of the function. The first definition of the group is prefixed by
fun
and subsequently definitions are prefixed by|
. When callingack
at runtime the temporal order of definitions is used to determine which definition to execute.Motivation
Function definition groups provide a form of declarative programming that can aid readability and writability of programs by liberating the programmer from performing control flow analysis. For example consider the
ack
function above in implemented in LinksThe concrete shape of the control flow is an implementation detail, here is an equivalent implementation with a slightly more involved control flow
The point is the concrete control flow is not illuminating. By using function definition groups a clever compiler can potentially pick the optimal implementation for the programmer.
In Links, one could make use of a
switch
-expression to implement theack
functionArguably this implementation is more readable than the previous implementation
ack'
, and it has a declarative flavour closer to that of function definition groups. Though, there is one caveat. This particular implementation is potentially less efficient as it requires the "arguments" to be boxed. A clever compiler may be able to figure out that this boxing is unnecessary.Concrete syntax
I propose to adopt a syntax similar to the syntax used by SML, i.e. the first definition of a definition group is prefixed by
fun
and subsequent definitions are prefixed by|
. Concretely, theack
function would be implemented asSemantics
In order to decide which definition to execute the runtime matches the formal parameter patterns of each definition against the list of arguments in the lexical order of definitions. If multiple matches are found, only the body of the first matched definition should be executed. If no match is found then a runtime error should be raised.
Implementation work
A full implementation would potentially touch many parts of the code base: parser, type checking, pattern matching compiler, and IR compiler. This is potentially a lot of work for a feature that has no research content associated with it. Thus I suggest, at least initially, to opt for a more lightweight implementation based on desugaring. We can implement a function definition group by desugaring it into a regular function definition whose body consists of a
switch
-expression, where each definition corresponds to a case in theswitch
-expression. Concretely, I have in mind that we can desugar them prior to type checking using a translation such asThe function
box
is intended to denote the standard Links tupling construct.Other remarks
I think this project could make a good initial internship or student project.