tree-sitter / tree-sitter-python

Python grammar for tree-sitter
MIT License
360 stars 132 forks source link

Add detailed pattern matching support #234

Closed jasontatton closed 1 year ago

jasontatton commented 1 year ago

Greetings, at Meta we are making use of tree-sitter-python in the ERRPY project which is leveraged by the Pyre type checker in order to parse Python code at Meta.

We would like to upstream this enhancement to the grammar which adds detailed support for pattern match cases (over just expression).

Within the ERRPY project this has enabled us to be able to correctly produce an AST for the Python 3.10 syntax of pattern match cases and also reject invalid case patterns (which would otherwise be accepted when the rule mapped to expression).

We have updated and enhanced the test cases to be more comprehensive of the various pattern match cases. We have also run tree-sitter generate and included the results as part of this PR.

One thing which we were unable to determine was how to run the code formatter for grammar.js.

amaanq commented 1 year ago

Interesting changes, I'll start taking a look now

You can run the format check with npm run lint and then format it with eslint grammar.js --fix, though I guess maybe it's better to have a script to run eslint with fixes

jasontatton commented 1 year ago

I have run npm run lint now

amaanq commented 1 year ago

I do have some thoughts on this - I can write more details in a couple days since I'm a bit busy atm

There are a lot of unnecessary fields, the way fields should be used imo is, if you write a capture for a specific node, can it match multiple spots in the parent node? If so, then a field can be useful. Say we have the following:

version: $ => seq($.number, $.number, $.number)

Each number is the semver major minor and patch, so a field here makes sense like so:

version: $ => seq(
  field('major', $.number), 
  field('minor', $.number), 
  field('patch', $.number),
)

Anyways, I'd definitely remove some of them to potentially reduce state count, +500 isn't good imo

amaanq commented 1 year ago

I have some more thoughts on the naming/visibility of some of these rules, but I'll discuss that more maybe Sunday

jasontatton commented 1 year ago

I have removed the unneeded fields as requested

amaanq commented 1 year ago

I don't like the recursive structure of rules that's typically seen in other notations - tree-sitter doesn't need all of that because of its precedence system. I'll rework it then merge

amaanq commented 1 year ago

I'll note that the style I'd want to see here is similar to the one I created for types here

https://github.com/tree-sitter/tree-sitter-python/pull/239

In that case, types are used sparsely, and should always be a higher precedence than a potential expression that can conflict w/ it.

amaanq commented 1 year ago

@jasontatton I've updated it to the style I like, but it might not be perfect. Just double check it and let me know if it seems fine to you, and I'll merge this. Thanks for being patient :slightly_smiling_face:

Another note for the future, please don't open PRs from your master branch, it makes it annoying to check out locally (naming conflicts), and for pushing to (git requires me to manually specify HEAD for master), just create a new branch when making pull requests to avoid this issue in the future :)

jasontatton commented 1 year ago

Ok will make a new branch for a PR next time.

The changes look good to me.

amaanq commented 1 year ago

Awesome, thanks for the contribution!