psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
38.82k stars 2.45k forks source link

Switch to a new parser #2318

Open JelleZijlstra opened 3 years ago

JelleZijlstra commented 3 years ago

Currently, Black uses a vendored version of lib2to3 for parsing. This works well for parsing Python 2 and early Python 3, but Python has now moved on to a PEG-based parser (PEP 617), and lib2to3 is no longer being maintained.

So we need a new parser. There are a few existing options that we could leverage (Parso, LibCST), but it's going to be a lot of work to do the migration. WE're doing some early brainstorming in a Google doc. This issue exists so that we have a public record that we know this is a problem.

Concrete pieces of syntax that are blocked by this new grammar include parenthesized context managers and the match statement in Python 3.10. (#2242 through #2586, #2667, #2758)

kamahen commented 3 years ago

The main bug is: https://bugs.python.org/issue40360 (and also https://bugs.python.org/issue36541).

I think that there's a fairly straightforward way of wrapping the new Python parser to give the necessary functionality that "Black" (and other source-level tools) need. However, it's a non-trivial amount of work, and I'm loathe to do it unless I'm sure it'll be used and that nobody else is doing the work. (There appears to be one existing wrapper, namely leoAst.py; I've looked at it a bit but it seems much more complicated than necessary and therefore could be both difficult to use and a maintenance issue.)

Some other discussion at https://github.com/kamahen/pykythe/issues/27 https://github.com/google/yapf/issues/825#issuecomment-868805396 , https://github.com/google/yapf/issues/894#issuecomment-799867767 and elsewhere.

ianliu commented 2 years ago

Has treesitter been considered? It already implements a parser for python here: https://github.com/tree-sitter/tree-sitter-python and I think it allows to build formaters upon it.

JelleZijlstra commented 2 years ago

@ianliu interesting, I hadn't heard of that!

Looking at the Python bindings (https://github.com/tree-sitter/py-tree-sitter), it might be hard to get it to work for us:

That sounds like it would lead to a lot of people with mildly exotic systems who'd be unable to install Black if it depended on this library.

jakkdl commented 1 year ago

LibCST now supports (according to readme) 3.0->3.11, though it does say

It is more difficult to implement tools that focus almost exclusively on whitespace on top of LibCST instead of lib2to3. For example, Black would need to modify whitespace nodes instead of prefix strings, making its implementation much more complex.

Udayraj123 commented 1 year ago

Hi @JelleZijlstra, wanted to know if a resolution would be provided for this any time soon. Any alternatives/work arounds for now?

JelleZijlstra commented 1 year ago

There are no concrete plans to switch to a new parser, but we have full support for the latest Python grammar changes through some hacks on our existing parser. What do you need a workaround for?

Udayraj123 commented 1 year ago

Oh I see, I was facing this issue: https://github.com/psf/black/issues/2242 with the match/case syntax. I guess it might be a configuration issue on my end then.

Edit: An error shown in this discussion seems to not address match case, was it fixed later?