qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
http://ecsharp.net
Other
172 stars 25 forks source link

Support C# 9 pattern matching #129

Closed qwertie closed 3 years ago

qwertie commented 3 years ago

I'm already working on this, but I'm thinking I ought to document my plan somewhere. I'm writing notes in the parser, but here I can put up a formatted table of example patterns:

Example Pattern Loyc tree (EC#/LES2 notation)
_ _
Enum.Value Enum.Value
2 + 2 2 + 2
>= 2 + 2 @`'>=`(2 + 2)
var x #var(@``, x)
var (x, y) #var(@``, (x, y)) (shorthand for #var(@``, @'tuple(x, y)))
var (x, (y, z)) #var(@``, (x, (y, z)))
{ } obj #var(@'deconstruct(@'tuple()), obj)
List<T> List!T (shorthand for @'of(List, T) a.k.a. List `'of` T)
List<T> list #var(List!T, list)
List<T>() list #var(@'deconstruct(List!T()), list)
List<T> { Count: >0 } x #var(@'deconstruct(List!T(), Count ::= @`'>`(0)), x)
List<T>() { Count:7 } x #var(@'deconstruct(List!T(), Count ::= 7), x)
int? @'of(@`'?`, #int32)
Foo<a, b>? @'of(@`'?`, @'of(Foo, a, b))
(Foo) { } x #var(@'deconstruct(@'tuple(Foo)), x)
(Foo) x @'cast(x, Foo)
(Foo + 0) x #var(@'deconstruct(@'tuple(Foo + 0)), x)
(a, b) @'deconstruct(@'tuple(a, b))
(a, b) { Foo: x } @'deconstruct(@'tuple(a, b), Foo ::= x)
(> 5, (_, _)) @'deconstruct(@'tuple(@`'>`(5), @'deconstruct(@'tuple(_, _))))
Point(X: > 5, Y: 0) @'deconstruct(Point(X ::= @`'>`(5), Y ::= 0))
Foo({ Length: 2 }) @'deconstruct(Foo(@'deconstruct(@'tuple(), Length ::= 2))
{ X: 1, Y: >0 } @'deconstruct(@'tuple(), X ::= 1, Y ::= @`'>`(0))
Foo(X: int x) {Y: >4} f #var(@'deconstruct(Foo(X ::= #var(#int32, x)), Y ::= @`'>`(4)), f)
not null @'not(null)
not not null @'not(@'not(null))
>= 'a' and <= 'z' @'and(@`'>=`('a'), @`'<=`('z'))
0 or 1 @'or(0, 1)
string or List<char>() @'or(#string, @'deconstruct(List!#char()))

There is no obvious way to map patterns into Loyc trees, and on any given day I might choose a different way to do it. The arbitrary mapping I chose is as follows:

There is lots of overlap between patterns and ordinary expressions, but here are some examples showing how patterns differ from normal C# or EC# expressions:

qwertie commented 3 years ago

The Loyc tree for 'deconstruct is designed to mirror the Loyc tree for 'new:

Expression Loyc tree
new Foo(arg) { Prop = 5 } @'new(Foo(arg), Prop = 5)
Foo(arg) { Prop: 5 } pattern @'deconstruct(Foo(arg), Prop ::= 5)

The existing structure of 'new turned out to be a boon for syntactic pattern-matching, as in this example:

define #var(__var, $name = new $T($(..args)) { $(..initializers) }) {
    $T $name = new $T($args) { $initializers };
}

// Input:
__var x = new List<SomeSortOfUnreasonablyLongTypeName>();
// Output:
List<SomeSortOfUnreasonablyLongTypeName> x = new List<SomeSortOfUnreasonablyLongTypeName>();

Although this example appears to require a list of initializers, it matches new-expressions with no initializers, because the underlying Loyc tree is the same for (e.g.) new Foo { } and new Foo(). In practice this is better than needing to handle "has initializer block" and "does not have initializer block" separately, so I expect it is best to have the same sort of structure in patterns.

Note that plain C# puts a number of restrictions on the kinds of expressions allowed in a pattern context:

Enhanced C# doesn't absolutely need to respect these restrictions, but traditionally, I've done my best to guarantee that (1) printer output always round-trips correctly and (2) if two Loyc trees are different from each other (ignoring trivia) the printer will produce different output for the two trees. It might not be easy to maintain these guarantees for patterns, though, and it's certainly questionable whether it's worth my time.

Enhanced C# already defined its own (dramatically simpler, but no less capable) pattern-matching syntax before C# 7 was published. I'll have to rip out the existing feature, which will be disruptive to existing Enhanced C# code, especially the matchCode macro. I have no plan yet for the transition. An obvious starting point is to support C# 9 patterns in the context of the is and switch operators, but not in the context of case. This reduces disruption because matchCode is based on a list of case statements without EC#-style patterns. match does rely on EC# patterns, but probably my 5 users (or however many there are) won't miss the feature given that it is designed to achieve precisely what C# 9 patterns do anyway.

qwertie commented 3 years ago

Whew. The parser is passing a battery of tests... now for the printer.

At this point I don't think round-tripping everything is feasible, because (just as in C# itself) the structure of an expression like a is (Boo, Coo) has changed. Previously it was a tuple type, but now it's a deconstruction (@'is(a, @'deconstruct((Boo, Coo))) rather than @'is(a, (Boo, Coo))). However if somebody constructs the tree @'is(a, (Boo, Coo)) and gives it to the printer, I think it should still print a is (Boo, Coo): this doesn't round-trip, but it does what the user probably expects. Plus, I just don't want to spend as much time on the printer as I did on the parser. At this point I'm thinking, if somebody passes in a tree like @'is(x, Foo[77].etc), I'm just going to print x is Foo[77].etc even though it'll break the parser, because doing it properly will just take too long.

If you need proper round-tripping, LES can do that for you.

qwertie commented 3 years ago

I think it's ready...