Open gvanrossum opened 4 years ago
(Oh, I tried, and that doesn't actually work even for the Python parser. But why? @lysnikolaou any idea?)
I tried this and it works... Now for filling in XXX...
args[expr_ty]:
| ','.(starred_expression | named_expression !'=')+ [',' kwargs] { XXX }
| a=kwargs { _Py_Call(_PyPegen_dummy_name(p),
CHECK_NULL_ALLOWED(_PyPegen_seq_extract_starred_exprs(p, a)),
CHECK_NULL_ALLOWED(_PyPegen_seq_delete_starred_exprs(p, a)),
EXTRA) }
Here is a draft implementation:
https://github.com/python/cpython/pull/22053
For this script:
import ast
code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
I get these results (release mode):
This patch: 0.17s +- 0.04
Python3.8: 0.27s +- 0.08
Master:
[1] 78689 segmentation fault (core dumped) ./python lel.py
The test corpus contained a case where function is called with many (over 255) arguments, each just numbers. This used to be a syntax error in 3.6 and before -- but with more recent versions of the bytecode compiler it works for 10s of 1000s of arguments.
However, the generated parser in Python cannot handle it! It raises a RecursionError for
A(0, 1, ..., k)
if k is 221 or higher. The traceback seems to be an endless repetition of this fragment:I think this is actually about this rule:
The part
[',' c=args {c}]
corresponds to_tmp_108
. This is a straightforward recursion (not left-).I can't actually fathom that the C parser doesn't segfault on this for me; it just slows down. (In the C code, it's
_tmp_109
, but the structure is the same.) The stack must automatically grow (on Mac, at least). But even there it would be nice if we could replace this by a simple loop.E.g. maybe we could make this work?
I haven't figured out what should go into XXX yet, but it's probably going to be more efficient, since we don't build up the array of arguments by prepending one at a time.