Reference Implementation

brandtbucher commented 4 years ago

Although the PEP currently defers it, I think this idea is just mature enough to start work on designing and testing a CPython implementation. In addition to a head start (if we're really targeting 3.10, that's only a year to get this working), this will hopefully also inform some of the more technical aspects of the proposal.

I'm comfortable working on everything in the compilation pipeline from the AST through the VM. So if others agree that this is a good idea, we'd need at least one parser guru on-board to make any real headway. :wink:

gvanrossum commented 4 years ago

I think it's a good idea, but we need to start with a way to make the pegen parser generator emit code that can handle conditional keywords.

This was a feature of early versions (including the Python prototypes I posted on Medium last year) but the current generator makes keywords hard reserved words, because that's what Python users currently expect.

My design was something like this: if the keyword in the grammar file is spelled with "double" quotes instead of 'single' quotes, then it's a conditional keyword, and the generator generates a different function call. Something like _PyPegen_expect_maybe_keyword(p, "match").

Have you looked at the generator in Tools/peg_generator at all?

brandtbucher commented 4 years ago

Would it be sufficient to use a temporary hard keyword of __match or something just to get this up and running, then change it later to be a conditional match?

I find the parsing machinery a bit overwhelming... but if we agree that this is important enough, I could start poking at it and becoming more familiar.

Tobias-Kohn commented 4 years ago

I am also happy to help with this. Although I am certainly not a parser guru, I have some familiarity with it (I wrote a Python parser as part of my PhD :wink:), and could take on that part.

brandtbucher commented 4 years ago

I think the best way to approach this is case-by-case. So get single bare name assignments working end-to-end, then wildcards, then constants, then guards, etc. until we have the full feature-set.

I'm happy to host this branch and an issue tracker, unless we'd like a more central location.

gvanrossum commented 4 years ago

Okay, starting with a standard reserved word sounds fine. I would recommend using match_ though. (And I would definitely avoid anything starting with __, because of the name mangling for private variables.)

I'll look into the changes to the generator to support my idea, it shouldn't be hard.

gvanrossum commented 4 years ago

Oh, also: let's keep the file "patma.py" in sync with the semantics and terminology of the PEP. This can serve as a cheat sheet for the actual implementation.

Since I wrote it, it's probably also something I should keep up with.

brandtbucher commented 4 years ago

So we're in agreement then that the PEP draft is our source of truth for the specification going forward? I think that's a good idea, but we'll have to be diligent about updating it to reflect any closed threads of discussion.

gvanrossum commented 4 years ago

Yes, the PEP is supposed to be the source of truth for accepted aspects of the proposal, and the Rejected Ideas section in the PEP is supposed to briefly summarize the discussion. There may also be an Open Issues section in the PEP where we keep issues on which we haven't reached agreement yet (but that only becomes important once we send the PEP to a larger audience).

Note that there are two forms of issues that may occur under Rejected Ideas:

For issues where we have to choose one option (e.g. case vs. as, #27) there should be a section under Rejected Ideas explaining what other contenders were put forward and how the pros and cons for those compare to the selected option.
For issues where we could opt to add something but decided not to add it now (e.g. the one-off syntax, #21), the Rejected Ideas section should just explain the proposal and the reason we decided not to add it. (In truth, some of these are more likely to be added later than others; e.g. #9 is unlikely, while #21 may well happen eventually.)

brandtbucher commented 4 years ago

Feel free to contribute implementation PRs to my branch, https://github.com/brandtbucher/cpython/tree/patma. If that proves too burdensome, I can just give push permissions to those who need it. I've already added basic AST/compiler support for constant patterns and guards, and more complicated patterns are on the way. You can view a continuously updating diff against master here:

https://github.com/python/cpython/compare/master...brandtbucher:patma

(Since we don't have parser support yet, the tests just feed hand-built ASTs directly into compile/exec. It actually works quite well!)

It probably makes the most sense to keep discussions here, and just add an implementation tag to those issues. I'll also update the Reference Implementation section of the PEP to point to this branch.

gvanrossum commented 4 years ago

A heroic effort! FWIW I'm looking into adding what I've termed "soft keywords" to the peg_generator. It seems quite easy -- I'm now stuck trying to debug why I can't add a print statement back like this:

print_stmt[stmt_ty]:
    | a="print" b=','.expression+ {
        _Py_Expr(
            _Py_Call(a, b, NULL, EXTRA),
            EXTRA) }

(It actually works, but only when I add the print_stmt alternative before star_expressions in small_stmt -- if I put it after that, I always get the standard message "did you mean print(2+2)"...)

gvanrossum commented 4 years ago

Got it working: https://github.com/python/cpython/pull/20370

brandtbucher commented 4 years ago

I've got the reference implementation into a good state. It should be following the spec fairly closely now, with a few minor exceptions that I hope to get to soon (I know how to fix them, just need to find 30 minutes here or there to actually do each one):

[x] f-strings are not rejected from patterns, and crash the program.
[x] Numeric addition is not rejected from patterns, and crashes the program.
[x] It isn't checked that __match_args__ only contains strings.
[x] Missing __match_args__ doesn't allow a single positional sub-pattern.
[x] __match_args__ isn't consulted for error checking if a keyword attribute is missing on the proxy.
[x] __match_args_required__ isn't actually checked.
[x] Duplicate mapping keys are not rejected.
[x] The tuple form of sequence patterns is not accepted.
[x] Line numbers and column offsets are incorrect or missing in many errors.
[x] **_ is not rejected from mapping patterns.
[x] We're doing a lot of extra work for _ patterns.
[x] No new exception types have been added (just raising TypeErrors for now).
[x] There is some fairly low-hanging fruit for speed/memory improvements for "large" sequences and mappings).
[x] _ loads are not rejected.
[ ] ASTs are missing validation for some nodes when passed directly to compile.
[ ] "Tuple"-style sequence patterns don't have any unit tests yet, and may contain bugs.
[ ] "Call"-style patterns don't have any unit tests yet, and may contain bugs.
[ ] Most error-raising situations don't have unit tests yet, and may contain bugs.

Please let me know if you notice any issues, and I'll add them to my list (I'm a fairly heavy user of assertions, so debug builds will probably surface more helpful information). In particular, I've got the grammar working for all of my test cases, but I have no idea if it's actually good. :slightly_smiling_face:

viridia commented 4 years ago

I'm having trouble running the code. Here's what I did:

git clone etc...
cd cpython-patma
./configure --prefix $HOME/Projects/cpython-patma-install
make install
mkdir cpython-patma-example
cd cpython-patma-example
../cpython-patma-install/bin/python3 -m venv .
. ./bin/activate
python math.py

  File "/Users/talin/Projects/cpython-patma-example/math.py", line 136
    match a:
          ^
SyntaxError: invalid syntax

brandtbucher commented 4 years ago

Did you check out the patma branch? Sorry, I should have made that clearer!

gvanrossum commented 4 years ago

Also, (OT) I apologize for the project name. I don’t know what it is but I have a terrible sense of naming lately — first plegen, now patma. :-(

viridia commented 4 years ago

I got it working, thanks! (I fail at git :( )

viridia commented 4 years ago

I'm getting a segfault with the following example:

class BinaryOp:
  __match_args__ = ['op', 'left', 'right']

  def __init__(self, op, left, right):
    self.op = op
    self.left = left
    self.right = right

def format_expr(expr):
  match expr:
    case BinaryOp(op, left, right):
      return 'x'
    case float() | int():
      return 'y'
    case _:
      return 'z'

print(format_expr(1))

How do you want me to report bugs?

brandtbucher commented 4 years ago

Thanks for finding this. I’ll add it as a test case and look into it tomorrow.

Opening issues in this repo and tagging them with “reference implementation” should be fine. I’ll keep track of them (in fact, I’ll probably open issues for some of my tasks above too).

brandtbucher commented 4 years ago

Fixed!

viridia commented 4 years ago

Yes, it's working for me now.

One suggestion: I would like it if:

  case (1, 2, 3):

...produced an error message more specific than just 'syntax error'. I got confused a couple of times because I forgot that this was illegal. Can we have a 'do you mean?' hint?

brandtbucher commented 4 years ago

Hm. Parsing tuples correctly is a bit complicated, so I'm hesitant to do it just for a nicer error message. Perhaps it was mostly surprising because the proposal has evolved so much over time, and we used to consider that syntax legal as recently as a week ago.

I'm open to doing this, though, if a reasonable user being introduced to the feature would expect it to work. The only problem is that this is just one of the many ways we differ from normal expression syntax, so this feels like a very slippery slope to me.

Maybe @Tobias-Kohn can weigh in?

brandtbucher commented 4 years ago

I take back my point about this specifically being hard to parse. It's actually as simple as raising if we observe:

A comma anywhere other than immediately inside of a list/dict display or call.
Empty parentheses anywhere other than a call.

I've added more helpful error messages for this now, just since I've found a working grammar for it. But I still think we should evaluate whether they're worth keeping ~5 lines in the grammar file:

>>> match (1, 2, 3):
...     case (1, 2, 3):
  File "<stdin>", line 2
    case (1, 2, 3):
           ^
SyntaxError: tuple displays cannot be used as patterns; did you mean '[...]'?

>>> match ():
...     case ():
  File "<stdin>", line 2
    case ():
         ^
SyntaxError: tuple displays cannot be used as patterns; did you mean '[]'?

Tobias-Kohn commented 4 years ago

@brandtbucher I really enjoyed playing around with this future Python version and seeing pattern matching in action. Thank you very much for this first implementation!

To be perfectly honest: I feel that error messages is one of the few areas where Python really does not shine in general. Furthermore, that tuple syntax is not allows will probably surprise a lot of people who start working with it. A good error message can really help here a lot in keeping frustrations low. So, yes, I think the five extra lines in the grammar file are worth it---at least for a first version. If the community decides to get rid of it for Python 3.11, then that's probably ok, too.

Tobias-Kohn commented 4 years ago

@gvanrossum No need to apologise as far as I am concerned: I am perfectly fine with the name "patma" :smiley:. In fact, "patma" was originally my first choice, too, but since it sounds a bit like "Padmé" I was not entirely sure and switched last minute.

gvanrossum commented 4 years ago

How much do you want me to try to kick the tires? Everything I've tried seems to work.

brandtbucher commented 4 years ago

That's great, thanks! I was just wondering if there were any more glaring bugs like the one @viridia found. I'll take care of adding more exhaustive/pathological test cases to make sure we have good coverage.

If you have some time to statically check that my pattern grammar is reasonable, that would be appreciated too (though it will probably get quite a bit of reviewing later). I've never written a grammar before, but what I've got seems to work well.

gvanrossum commented 4 years ago

Okay, reviewing the grammar carefully seems like a primary task for me. I will start with that this week.

brandtbucher commented 4 years ago

By the way, the VSCode syntax extensions for our .gram and .asdl files are awesome. Love that Lysandros and Brett took the time to make them.

brandtbucher commented 4 years ago

@gvanrossum did you mean to tweet out a link to my last comment? Haha.

gvanrossum commented 4 years ago

Oh shit. :-)

viridia commented 4 years ago

Question: who is going to be implementing the reference implementation for typing.sealed and the changes to dataclass?

brandtbucher commented 4 years ago

The reason I’ve left stdlib changes un-done is that those are fairly isolated tasks that others can work on without creating conflicts on my branch. So if anyone is interested in implementing them, they should open a PR in my repo (or ask for push permissions)!

Otherwise, I’m fine taking care of those too after the compiler/VM work is finished.

gvanrossum / patma

Reference Implementation #39