Open viridia opened 4 years ago
Yeah, since traversing ASTs looking for specific patterns is such an obvious use case, a toy app that actually manipulates a simple AST with the usual arithmetic operators seems quite sensible as an example.
I don't know how long or short you're thinking you could make that. Perhaps just the example of pretty-printing an expression back from such an AST (inserting only needed parentheses, for example) would already make a decent example app.
I'd definitely use objects, not nested tuples.
Got a question: will the following work?
# Create named capturing groups for each token type
TOKENS = re.compile(
'|'.join([
r'(?P<ws>\s+)',
r'(?P<ident>[A-Za-z_][A-Za-z0-9_]*)',
r'(?P<number>[0-9]+\.[0-9]*)',
r'(?P<oper>\+|\-|\*\*|\*|/|\(|\))'
])
)
m = TOKENS.match(input)
if m:
match m.groupdict():
case {"ws": _}:
print(f"Got whitespace")
case {"ident": value}:
print(f"Got an identifier: {value}")
case {"number": value}:
print(f"Got a number: {value}")
case {"oper": "+"}:
print("Got a plus sign")
case {"oper": "-"}:
print("Got a minus sign")
In other words, what is happening here is that I'm using the match statement to test to see which capturing groups are present in the re.match result. This replaces an if/elif chain which tests each matching group in groupdict.
This may seem like an odd use of match, but "the street finds its own uses for things", i.e. people are going to evolve usage idioms which we may not anticipate.
Homework: come up with a version of this that works with m.groups() instead of m.groupdict(). You may have to create a custom pattern class to do this.
So does it work? Are named subpatterns that are not matched excluded from the group dict?
Gah, apparently they exist but are assigned to 'None'. That won't work...
What about something that crawls Python source files and finds likely candidates for pattern matching syntax?
@brandtbucher That's an interesting idea - you'd look for if/elif chains that use the same value on one side.
Part of what I was trying to avoid with my example was this ugly pattern:
if m.group('ident'):
result.append((TokenType.IDENT, m.group('ident')))
elif m.group('number'):
result.append((TokenType.NUMBER, m.group('number')))
elif m.group('oper'):
result.append((TokenType.OPER, m.group('oper')))
..where you are having to call m.group() twice - although I suppose you could avoid this with the walrus operator? I'm not sure, I'm not that familiar with it.
I could make the previous example work by adding a guard condition on each case, but that kind of ruins the brevity of it. What's the most succinct way to say "I want to match the dictionary value with this key, but only if the value is truthy"?
Similarly, I could imagine a custom MatchGroup(key=value) match pattern class, but I haven't quite figured out how it would work. Basically the __match__
method of MatchGroup would call m.group() to see if the named group exists - except that it can't, because __match__
doesn't know what named group we are looking for.
I wonder, perhaps, that in the interest of simplicity we may have made the protocol a bit too inflexible.
I personally think using regexes to show pattern matching is a bit confusing, since it utilizes two different forms of matching... I don't want to imply that this is somehow a replacement for regexes, or made specifically to work with them.
Sure, but I wasn't thinking in terms of examples when I wrote that code. I was more just exploring, thinking 'OK, where can I use a match statement?' and looking for if/elif chains that would be likely candidates.
I admit that using match in this way is somewhat forced - but consider that average Python users are going to go through the same thought process. They are going to look at their long if/elif chains and think about how they can get a match statement to work in that situation. And in many cases the solutions they come up with will be surprising (for good or ill).
There are some use cases where match is a natural fit, and others where it can be made to work but it's hammering a square peg into a round hole. There's no crisp delineation yet as to what's an 'blessed' use of match and what is an ugly hack. Nor could we write such a recommendation, since we don't have a good sampling of use cases yet.
@viridia's issue #44 mentions another use case for pattern matching: event handling. At least in Scala, events are often handled through a match statement, i.e. something of the form:
match event:
case KeyEvent(.KEY_PRESSED, char):
print("You pressed: " + char)
case MouseEvent(.MOUSE_CLICKED, x, y, BUTTON.LEFT):
print(f"You clicked at {x}, {y}")
case TextChanged(component, newText) if component == nameEdit:
print(f"You changed your name to {newText}")
The idea is that you can use one and the same mechanism to fire and propagate any kind of event through the framework. I am, however, not familiar enough with common Python frameworks to know if this is a valid use case here as well.
That looks pretty for a hypothetical GUI framework. But the only GUI framework with which I am at all familiar is tkinter, and there the event handling is typically done by binding different functions to different event patterns.
Also note that you need a .
in front of MOUSE_CLICKED
.
I have created PR #60 which is a sample program that makes use of match
in various ways. Probably the most interesting one is in simplify_expr
.
Note: I also wanted to write a differentiator, but that will take longer :)
Example use:
> python math.py
Enter an command followed by an arithmetic expression.
Commands are:
* print: print the expression
* tree: print the expression in tree form
* eval: evaluate the expression
* simplify: fold constants and remove identities
'quit' or 'q' to quit.
> simplify x + y * 0
x
By the way, part of my motivation in writing this was to find some example, other than writing a compiler, that would make use of match statements in a bunch of different styles. Part of what is being explored is the possible space of common idioms for matching - there are a lot of different ways I could have chosen to write the logic, even with match.
For example, in simplify_expr
I deconstruct the binary operator expression, recursively simplify the arguments (I don't mutate the original object), and then temporarily construct a tuple (which presumably is less costly than re-constructing the expression object with the updated arguments) so that I can then run another, inner match to search for various patterns of identities like x * 1
or 0 + x
.
Exercise for the reader: add a rule that transforms x * -1
to -x
.
@viridia Your example is really nice and I was strongly tempted to spend way too much time doing these "exercise for the reader" and add a bunch of other rules to simplify_expr
:wink:.
One of the challenges in coming up with examples is perhaps that pattern matching is a feature that will become quite pervasive without dominating the code. Hence, the examples of a compiler and expression evaluator might be as typical examples as we can get and come up with.
One question I have is: what is going to happen to the sample code? Are we going to publish it somewhere? Will this Github repo become public once the PEP is ready (that was my assumption given that we are being meticulous about maintaining the record of conversations). Will the sample code be referenced in the final PEP? (In which case, this issue should no longer have 'unpeppable' status.)
Do we need additional examples from other problem domains?
I assume at some point we'll open the repo (in fact I think we may do that now) and the PEP can link to the directory with sample code in the repo.
Do we need to include an example program in the repo that uses namedtuple and/or dataclasses, or is the existing sample program sufficient? I've kind of run out of ideas for small toy programs that are match-heavy.
I think a two-line example in the PEP should suffice for namedtuple. For dataclasses, I thought we had considered using @dataclass for UnaryOp and VarExpr but not for BinaryOp? That would show off both ways, and it makes sense given the code.
Here's a small matching example that may resonate more with people who don't write compilers, without being a switch statement:
for i in range(1, 101):
match (i%3, i%5):
case (0, 0): print("FizzBuzz")
case (0, _): print("Fizz")
case (_, 0): print("Buzz")
case (_, _): print(i)
I usually hate fizzbuzz as a programming example — but this is the shortest solution I’ve seen, so perhaps it is useful.
Another example might be merging of two nested dicts for which there are a ton of super ugly (and most probably subtly buggy solutions on stackoverflow)
In my own answer I cheated somewhat and used a third party library for multiple dispatch. With match
however I think my answer could be rewritten as:
def merge(f):
def merge(a, b):
keys = a.keys() | b.keys()
return {key: f(a.get(key), b.get(key)) for key in keys}
return merge
def f(a, b):
match (a, b):
case (dict(), dict()):
return merge(f)(a, b):
case (list(), list()):
return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]
case (_, None):
return a
case (_, _):
return b
or without the split in two distinct functions:
def deep_merge(a, b):
match (a, b):
case (dict(), dict()):
keys = a.keys() | b.keys()
return {key: deep_merge(a.get(key), b.get(key)) for key in keys}
case (list(), list()):
return [deep_merge(*args) for args in itertools.zip_longest(a, b)]
case (_, None):
return a
case (_, _):
return b
Question: can the parentheses in match
and case
be omitted?
def deep_merge(a, b):
match a, b:
case dict(), dict():
keys = a.keys() | b.keys()
return {key: deep_merge(a.get(key), b.get(key)) for key in keys}
case list(), list():
return [deep_merge(*args) for args in itertools.zip_longest(a, b)]
case _, None:
return a
case _, _:
return b
Once the reference implementation is at a point where the match statement is actually usable, I have been trying to think of an example program that I could write to use it. It would have to be short, make use of match in multiple places, and do something sensible.
I don't want to do something like a full compiler because it would be too big, nor a text adventure because it would need a lot of non-match code.
So far the best idea I have come up with is a program to do simple algebraic manipulations like simplification and differentiation. I'd have a simple operator-precedence parser that converts an input equation into an AST - whether the AST consists of actual objects, or just nested tuples is TBD. The rest is just pattern matching and transformation. Printing out the result would also be performed by match.
Any other ideas?