Open brandtbucher opened 4 years ago
I think it's a good idea, but we need to start with a way to make the pegen parser generator emit code that can handle conditional keywords.
This was a feature of early versions (including the Python prototypes I posted on Medium last year) but the current generator makes keywords hard reserved words, because that's what Python users currently expect.
My design was something like this: if the keyword in the grammar file is spelled with "double"
quotes instead of 'single'
quotes, then it's a conditional keyword, and the generator generates a different function call. Something like _PyPegen_expect_maybe_keyword(p, "match")
.
Have you looked at the generator in Tools/peg_generator at all?
Would it be sufficient to use a temporary hard keyword of __match
or something just to get this up and running, then change it later to be a conditional match
?
I find the parsing machinery a bit overwhelming... but if we agree that this is important enough, I could start poking at it and becoming more familiar.
I am also happy to help with this. Although I am certainly not a parser guru, I have some familiarity with it (I wrote a Python parser as part of my PhD :wink:), and could take on that part.
I think the best way to approach this is case-by-case. So get single bare name assignments working end-to-end, then wildcards, then constants, then guards, etc. until we have the full feature-set.
I'm happy to host this branch and an issue tracker, unless we'd like a more central location.
Okay, starting with a standard reserved word sounds fine. I would recommend using match_
though. (And I would definitely avoid anything starting with __
, because of the name mangling for private variables.)
I'll look into the changes to the generator to support my idea, it shouldn't be hard.
Oh, also: let's keep the file "patma.py" in sync with the semantics and terminology of the PEP. This can serve as a cheat sheet for the actual implementation.
Since I wrote it, it's probably also something I should keep up with.
So we're in agreement then that the PEP draft is our source of truth for the specification going forward? I think that's a good idea, but we'll have to be diligent about updating it to reflect any closed threads of discussion.
Yes, the PEP is supposed to be the source of truth for accepted aspects of the proposal, and the Rejected Ideas section in the PEP is supposed to briefly summarize the discussion. There may also be an Open Issues section in the PEP where we keep issues on which we haven't reached agreement yet (but that only becomes important once we send the PEP to a larger audience).
Note that there are two forms of issues that may occur under Rejected Ideas:
case
vs. as
, #27) there should be a section under Rejected Ideas explaining what other contenders were put forward and how the pros and cons for those compare to the selected option.Feel free to contribute implementation PRs to my branch, https://github.com/brandtbucher/cpython/tree/patma. If that proves too burdensome, I can just give push permissions to those who need it. I've already added basic AST/compiler support for constant patterns and guards, and more complicated patterns are on the way. You can view a continuously updating diff against master here:
https://github.com/python/cpython/compare/master...brandtbucher:patma
(Since we don't have parser support yet, the tests just feed hand-built ASTs directly into compile
/exec
. It actually works quite well!)
It probably makes the most sense to keep discussions here, and just add an implementation
tag to those issues. I'll also update the Reference Implementation section of the PEP to point to this branch.
A heroic effort! FWIW I'm looking into adding what I've termed "soft keywords" to the peg_generator. It seems quite easy -- I'm now stuck trying to debug why I can't add a print statement back like this:
print_stmt[stmt_ty]:
| a="print" b=','.expression+ {
_Py_Expr(
_Py_Call(a, b, NULL, EXTRA),
EXTRA) }
(It actually works, but only when I add the print_stmt
alternative before star_expressions
in small_stmt
-- if I put it after that, I always get the standard message "did you mean print(2+2)"...)
Got it working: https://github.com/python/cpython/pull/20370
I've got the reference implementation into a good state. It should be following the spec fairly closely now, with a few minor exceptions that I hope to get to soon (I know how to fix them, just need to find 30 minutes here or there to actually do each one):
__match_args__
only contains strings.__match_args__
doesn't allow a single positional sub-pattern.__match_args__
isn't consulted for error checking if a keyword attribute is missing on the proxy.__match_args_required__
isn't actually checked.**_
is not rejected from mapping patterns._
patterns.TypeError
s for now)._
loads are not rejected.compile
.Please let me know if you notice any issues, and I'll add them to my list (I'm a fairly heavy user of assertions, so debug builds will probably surface more helpful information). In particular, I've got the grammar working for all of my test cases, but I have no idea if it's actually good. :slightly_smiling_face:
I'm having trouble running the code. Here's what I did:
git clone etc...
cd cpython-patma
./configure --prefix $HOME/Projects/cpython-patma-install
make install
mkdir cpython-patma-example
cd cpython-patma-example
../cpython-patma-install/bin/python3 -m venv .
. ./bin/activate
python math.py
File "/Users/talin/Projects/cpython-patma-example/math.py", line 136
match a:
^
SyntaxError: invalid syntax
Did you check out the patma
branch? Sorry, I should have made that clearer!
Also, (OT) I apologize for the project name. I don’t know what it is but I have a terrible sense of naming lately — first plegen, now patma. :-(
I got it working, thanks! (I fail at git :( )
I'm getting a segfault with the following example:
class BinaryOp:
__match_args__ = ['op', 'left', 'right']
def __init__(self, op, left, right):
self.op = op
self.left = left
self.right = right
def format_expr(expr):
match expr:
case BinaryOp(op, left, right):
return 'x'
case float() | int():
return 'y'
case _:
return 'z'
print(format_expr(1))
How do you want me to report bugs?
Thanks for finding this. I’ll add it as a test case and look into it tomorrow.
Opening issues in this repo and tagging them with “reference implementation” should be fine. I’ll keep track of them (in fact, I’ll probably open issues for some of my tasks above too).
Fixed!
Yes, it's working for me now.
One suggestion: I would like it if:
case (1, 2, 3):
...produced an error message more specific than just 'syntax error'. I got confused a couple of times because I forgot that this was illegal. Can we have a 'do you mean?' hint?
Hm. Parsing tuples correctly is a bit complicated, so I'm hesitant to do it just for a nicer error message. Perhaps it was mostly surprising because the proposal has evolved so much over time, and we used to consider that syntax legal as recently as a week ago.
I'm open to doing this, though, if a reasonable user being introduced to the feature would expect it to work. The only problem is that this is just one of the many ways we differ from normal expression syntax, so this feels like a very slippery slope to me.
Maybe @Tobias-Kohn can weigh in?
I take back my point about this specifically being hard to parse. It's actually as simple as raising if we observe:
I've added more helpful error messages for this now, just since I've found a working grammar for it. But I still think we should evaluate whether they're worth keeping ~5 lines in the grammar file:
>>> match (1, 2, 3):
... case (1, 2, 3):
File "<stdin>", line 2
case (1, 2, 3):
^
SyntaxError: tuple displays cannot be used as patterns; did you mean '[...]'?
>>> match ():
... case ():
File "<stdin>", line 2
case ():
^
SyntaxError: tuple displays cannot be used as patterns; did you mean '[]'?
@brandtbucher I really enjoyed playing around with this future Python version and seeing pattern matching in action. Thank you very much for this first implementation!
To be perfectly honest: I feel that error messages is one of the few areas where Python really does not shine in general. Furthermore, that tuple syntax is not allows will probably surprise a lot of people who start working with it. A good error message can really help here a lot in keeping frustrations low. So, yes, I think the five extra lines in the grammar file are worth it---at least for a first version. If the community decides to get rid of it for Python 3.11, then that's probably ok, too.
@gvanrossum No need to apologise as far as I am concerned: I am perfectly fine with the name "patma" :smiley:. In fact, "patma" was originally my first choice, too, but since it sounds a bit like "Padmé" I was not entirely sure and switched last minute.
How much do you want me to try to kick the tires? Everything I've tried seems to work.
That's great, thanks! I was just wondering if there were any more glaring bugs like the one @viridia found. I'll take care of adding more exhaustive/pathological test cases to make sure we have good coverage.
If you have some time to statically check that my pattern grammar is reasonable, that would be appreciated too (though it will probably get quite a bit of reviewing later). I've never written a grammar before, but what I've got seems to work well.
Okay, reviewing the grammar carefully seems like a primary task for me. I will start with that this week.
By the way, the VSCode syntax extensions for our .gram
and .asdl
files are awesome. Love that Lysandros and Brett took the time to make them.
@gvanrossum did you mean to tweet out a link to my last comment? Haha.
Oh shit. :-)
Question: who is going to be implementing the reference implementation for typing.sealed
and the changes to dataclass
?
The reason I’ve left stdlib changes un-done is that those are fairly isolated tasks that others can work on without creating conflicts on my branch. So if anyone is interested in implementing them, they should open a PR in my repo (or ask for push permissions)!
Otherwise, I’m fine taking care of those too after the compiler/VM work is finished.
Although the PEP currently defers it, I think this idea is just mature enough to start work on designing and testing a CPython implementation. In addition to a head start (if we're really targeting 3.10, that's only a year to get this working), this will hopefully also inform some of the more technical aspects of the proposal.
I'm comfortable working on everything in the compilation pipeline from the AST through the VM. So if others agree that this is a good idea, we'd need at least one parser guru on-board to make any real headway. :wink: