python / cpython

The Python programming language
https://www.python.org
Other
62.33k stars 29.94k forks source link

Simplify the grammar of the `assignment` rule #122951

Open tomasr8 opened 1 month ago

tomasr8 commented 1 month ago

Feature or enhancement

Proposal:

I don't know all the nuances of the new grammar but I think that the assignment rule can be simplified in the following way:

assignment[stmt_ty]:
    | a=NAME ':' b=expression c=['=' d=annotated_rhs { d }] {
        CHECK_VERSION(
            stmt_ty,
            6,
            "Variable annotation syntax is",
            _PyAST_AnnAssign(CHECK(expr_ty, _PyPegen_set_expr_context(p, a, Store)), b, c, 1, EXTRA)
        ) }
    | a=('(' b=single_target ')' { b }
         | single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }] {
        CHECK_VERSION(stmt_ty, 6, "Variable annotations syntax is", _PyAST_AnnAssign(a, b, c, 0, EXTRA)) }
-     | a[asdl_expr_seq*]=(z=star_targets '=' { z })+ b=(yield_expr | star_expressions) !'=' tc=[TYPE_COMMENT] {
+     | a[asdl_expr_seq*]=(z=star_targets '=' { z })+ b=annotated_rhs !'=' tc=[TYPE_COMMENT] {
         _PyAST_Assign(a, b, NEW_TYPE_COMMENT(p, tc), EXTRA) }
-     | a=single_target b=augassign ~ c=(yield_expr | star_expressions) {
+     | a=single_target b=augassign ~ c=annotated_rhs {
         _PyAST_AugAssign(a, b->kind, c, EXTRA) }
    | invalid_assignment

annotated_rhs[expr_ty]: yield_expr | star_expressions

That is, just reusing annotated_rhs instead of writing out (yield_expr | star_expressions).

There was a similar issue before that fixed some cases, but this one remained. I can submit a PR if this is considered worthwhile.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

https://github.com/python/cpython/issues/116988

dongrixinyu commented 1 week ago

Excuse me. I am reading the source code of parse.c ane notice the function assignment_rule. I am confused about the annotation of

// assignment:
//     | NAME ':' expression ['=' annotated_rhs]
//     | ('(' single_target ')' | single_subscript_attribute_target) ':' expression ['=' annotated_rhs]
//     | ((star_targets '='))+ (yield_expr | star_expressions) !'=' TYPE_COMMENT?
//     | single_target augassign ~ (yield_expr | star_expressions)
//     | invalid_assignment

What's the meaning of these patterns, for example, ((star_targets '='))+ (yield_expr | star_expressions) !'=' TYPE_COMMENT?

What is star_targets and star_expressions? Could you provide some help to understand it?

picnixz commented 1 week ago

See https://docs.python.org/3/reference/grammar.html#full-grammar-specification to understand the grammar. The star_targets and star_expressions are defined later in the document. CPython mixes EBNF and PEG syntax just for their own convenience.

As for the issue, I think you can replace it. On the other hand, I wonder whether this could cause a slight performance change where the parser needs to make another pass to subsitute the rule or not.

EDIT: nvm the performance shouldn't really be the bottleneck here (in the relevant PR, the occurrences were just outright replaced)

dongrixinyu commented 1 week ago

I reviewed several functions in parse.c, like atom_rule. I find it recursively checks the type of atom tokens, such as NAME, True, False, None, Number, etc. It seem like really time-consuming.

Is it really nessesary to be designed like this?

picnixz commented 1 week ago

Is it really nessesary to be designed like this?

If you want a more in-depth rationale, read PEP 617. If you can find a way to make an equivalent parser but simpler, we'll gladly accept it. Long story short, we moved from an LL(1) parser to a PEG one because left recursion is not supported by LL(k) parsers (actually, it can cause an infinite recursion on PEG parsers but LL(k) parsers do not allow them in their grammar). Another reason why we use a PEG parser is that a CFG parser would have ambiguous cases which a PEG does not by essence.

dongrixinyu commented 1 week ago

Thx for your explanation. I will take a deeper look at the parser.

I used to assume the parser works when executing python program. But when I checked CPython source codes, I recognized parsing Python codes only occurs before generating .pyc. Based on this, the speed of parsing Python codes does not make a big deal. Right?

picnixz commented 1 week ago

Depends on the use I'd say but here it's probably negligible, though I'm not very familiar with how the pieces are generated. Nonetheless, simplfying how the grammar looks for the user is a good thing IMO.