miking-lang / miking

Miking - the meta viking: a meta-language system for creating embedded languages
Other
51 stars 31 forks source link

Multiple sequenced pattern-matches in a single match #230

Open elegios opened 3 years ago

elegios commented 3 years ago

During the meeting yesterday we discussed a likely extension of our current match expression: the ability to perform several pattern-matches in sequence, using values bound in earlier patterns, then only evaluate the then branch if all matches succeed. This generalizes guards (it's essentially pattern guards in Haskell) and is quite simple to compile into. Lower your guards uses an equivalent form for their exhaustiveness and redundancy analyses so it should also be reasonable to do analysis on this form, though this is less obvious.

Example (using and to separate each match):

match v with TmLet{ident = ident} and hasSymbol ident with true then
  print "we have a symbol"
else
  print "we do not have a symbol"

Without this feature guards are harder to encode since the failure case needs to be duplicated (or use some more fancy encoding):

match v with TmLet{ident = ident} then
  if hasSymbol ident then
    print "we have a symbol"
  else
    print "we do not have a symbol"
else
  print "we do not have a symbol"

However, we need to decide on a syntax for this feature. Suggestions thus far include:

For the alternatives with a full word it's potentially easier to split them over multiple lines:

match v with TmLet{ident = ident}
and hasSymbol ident with true
then
...

vs

match v with TmLet{ident = ident},
  hasSymbol ident with true
then
...

or

match v with TmLet{ident = ident}
  , hasSymbol ident with true
then
...

Allowing a trailing separator could be nice for the non-word alternatives:

match
  v with TmLet{ident = ident};
  hasSymbol ident with true;
then
...
elegios commented 3 years ago

We discussed some during the meeting yesterday and noted that an alternative could be to put guards inside patterns instead, and I said I'd consider how that might look. Here're my conclusions/suggestions based on that.

We could introduce a new Pat: <Exp> with <Pat>. This would be a pattern that matches any value, but additionally evaluates the expression and tries to match it against the given pattern, which could either fail or succeed. For example:

match tm with TmApp app & (isValue app.f with true) then
  ...

By the semantics of & we're first checking if tm matches TmApp app, then if it matches isValue app.f with true. The latter pattern never looks at tm, instead it evaluates isValue app.f (note that app is bound by an earlier pattern) and checks if the resulting value matches true.

Aside on the meaning of "earlier pattern" --- The meaning of "earlier pattern" should be fairly intuitive, it mostly means "to the left in the AST" (which ends up being equivalent with "to earlier in the file"), except for or-patterns, where names in an earlier branch are not available in later branches. For example, this is invalid: ``` match tm with (foo | (doStuff foo with bar)) then ... ``` Names in an or-branch are available _after_ the or-pattern though, as one might expect. ---

Given this change we could then also change match; we could give it only a single pattern, and always match it against some dummy value, e.g., unit. The syntax would then be match <Pat> then <Exp> else <Exp>, which feels some sort of clean.

Problems

First, mlang semantic functions would not support these patterns; they don't fit in the analysis at present (there might be a useful middle ground here, but I haven't thought about it yet). This feels fine to me.

Second, this does not work with our parser, this syntax is not LR(1). The issue is that LR(1) needs to know what it has just parsed when it sees the next token after it, but, e.g., Some (a, b) is syntactically valid as both an Exp and a Pat, and we don't know which it should be until we see with (it should be an Exp) or then (it should be a Pat).

Syntax Solution 1

We could switch the order of Pat and Exp, make it <Pat> with <Exp> (or <Pat> against <Exp>, or w/e) instead. I think this is bad because of execution order and scoping; conceptually the expression is evaluated before we check it against the pattern (but the expression would be after the pattern), and the pattern can bind names that are not in scope in the expression (since that would be a cyclic dependency), despite the expression being "after" the pattern. As such, I don't really like this option.

Syntax Solution 2

This is the conservative change (that breaks all our code, depending on whether we change match or not). We could add a keyword before a with pattern, e.g., val <Exp> with <Pat>. This fixes the LR(1) issue, and might also be a nice signal that this is a slightly unusual pattern.

Of course, if we make the change that match only takes a pattern, that means that we have to rewrite all our matches:

- match foo with bar then
+ match val foo with bar then

There is at least one tool that could do this rewrite automatically that seems to handle it well, so making the change should be relatively simple.

There are two other variations here:

Syntax solution 3

This is the radical change that, surprisingly, I believe is backwards compatible. We could remove the Pat syntactical sort, and instead put all its productions in Exp (including the new with).

This is slightly less weird than it may originally seem: patterns are designed to look like expressions, they're just limited to a subset of them. This way we'd parse an arbitrary expression in a pattern position, then check that it's just terms we can handle (mostly constants and composite literals) and/or transform to the normal Pat AST type. I believe GHC uses this trick in its parser, so we'd not be the first. It's also possible that we could do something nice with our syns, have Pat literally be an Exp that is limited to certain constructors, though I don't think our current thoughts can quite express this at present.

In this world match would look as follows: match <Exp> then <Exp> else <Exp>.

For pattern things (|, &, with) appearing in expression-position we could do the same (error on them/remove them). We could also be slightly less radical, and only have this merged syntactical sort in a match: match <MergedExpPat> then <Exp> else <Exp>, but I kinda feel like we could get nicer error messages if we just had one Exp; a message saying that "| is only allowed in patterns" might be clearer than "Unexpected token |".

Click here to dive down the rabbit hole of radical changes --- `match then else ` looks a lot like an `if`. We could imagine that `a with b` is an expression that evaluates to `true` iff the pattern `b` matches the result of evaluating `a`, and we'd have our `match` with just an `if` and a `with` (except for name binding). In this context `&` and `|` are mostly just short-circuiting boolean "and" and "or". If we could then attach binding information to boolean information (e.g., "if this value is `true` then these names with these types are in scope") we would have the entirety of our `match` back (provided `if` used this information of course). One way to make this information a bit more explicit would be to represent it as first-class modules or records, since both of those are collections of named values with associated types. For example: ``` let foo = Some (1, "blub") in let tmp: Option {a: Int, b: String} = foo with Some (a, b) -- If implicitly destructures the record and brings the values in scope if tmp then dprintLn a; dprintLn b; else () ``` `Option` would then have to be known to the compiler, and the type parameter would be constructed by the compiler. Notably, this might be an interesting direction to go to have extensible patterns. I believe that we could mostly type the boolean pattern operators properly: ``` (|) : forall m. Option m -> Option m -> Option m -- I'm using { m1, m2 } to mean the merging of the row-types m1 and m2 (&) : forall m1 m2. Option m1 -> Option m2 -> Option { m1, m2 } (!) : forall m. Option m -> Option {} ``` However, it might also be possible to build this in a layer on top of things, e.g., in Ragnar, it might not be a thing we necessarily want in a core language. ---