python / cpython

The Python programming language
https://www.python.org
Other
63.22k stars 30.27k forks source link

Grammar differences between Doc grammar and Pegen grammar for Python #100292

Open kaby76 opened 1 year ago

kaby76 commented 1 year ago

This is a question about the difference between the grammars at:

I am scraping and refactoring the Pegen grammar for Python at https://github.com/python/cpython/blob/9d2dcbbccdf4207249665b58c8bd28430c4f7afd/Grammar/python.gram mechanically. After removing the strings for lookahead, I performed a diff between the grammar in the Doc at https://docs.python.org/3/reference/grammar.html and the refactored grammar. It should be the same. But, it is not.

I noticed that some lookahead expressions are deleted for a rule in the Doc grammar, but others are not.

For example, in the Doc grammar, the simple_stmt rule does not have the &e lookahead expressions:

simple_stmt:
    | assignment
    | star_expressions 
    | return_stmt
    | import_stmt
    | raise_stmt
    | 'pass' 
    | del_stmt
    | yield_stmt
    | assert_stmt
    | 'break' 
    | 'continue' 
    | global_stmt
    | nonlocal_stmt

Similarly, for the compound_stmt rule, the &-lookahead expressions have been deleted from the Doc grammar:

compound_stmt:
    | function_def
    | if_stmt
    | class_def
    | with_stmt
    | for_stmt
    | try_stmt
    | while_stmt
    | match_stmt

Of course, the simple_stmt rule in the .gram file contains the & lookahead expressions:

simple_stmt[stmt_ty] (memo):
    | assignment
    | e=star_expressions { _PyAST_Expr(e, EXTRA) }
    | &'return' return_stmt
    | &('import' | 'from') import_stmt
    | &'raise' raise_stmt
    | 'pass' { _PyAST_Pass(EXTRA) }
    | &'del' del_stmt
    | &'yield' yield_stmt
    | &'assert' assert_stmt
    | 'break' { _PyAST_Break(EXTRA) }
    | 'continue' { _PyAST_Continue(EXTRA) }
    | &'global' global_stmt
    | &'nonlocal' nonlocal_stmt

So far, this all makes sense: remove the lookahead because it is not needed in the CFG.

But, if we now look further into the Doc grammar, we see that some rules still have the lookahead listed:

del_stmt:
    | 'del' del_targets &(';' | NEWLINE) 
...
slash_no_default:
    | param_no_default+ '/' ',' 
    | param_no_default+ '/' &')' 
slash_with_default:
    | param_no_default* param_with_default+ '/' ',' 
    | param_no_default* param_with_default+ '/' &')' 

Can someone clarify why some lookahead expressions are deleted and others are not in the Doc grammar?

puneeth072003 commented 1 year ago

I'd like to work on this issue