we-like-parsers / pegen

PEG parser generator for Python
https://we-like-parsers.github.io/pegen/
MIT License
150 stars 32 forks source link

Delayed error inspection #60

Closed MatthieuDartiailh closed 2 years ago

MatthieuDartiailh commented 2 years ago

This attempts to mimic CPython behavior by calling invalid rule only in a second pass to speed up parsing on the happy path. It also includes changes https://github.com/python/cpython/commit/390459de6db1e68b79c0897cc88c0d562693ec5c to avoid calling invalid rules from within without invalid rules.

~~With those changes, I thought I would be able to get rid of having both immediate and deferred raising of exceptions (store vs raise) since looking at CPython it looks to me as if as soon as an exception is reported the parser exits. This fast exit is controlled by the error_indicator attribute on the parser state. However I hit one issue when enabling direct raising in invalid rules (so in CPython terms when looking for details about a syntax error): f(i for i in range(10)) fails to parse because in t_primary the alternative a=t_primary b=genexp &t_lookahead fails and then the alternative a=t_primary '(' b=[arguments] ')' &t_lookahead trips through the invalid argument rule. This should parse and would mask any actual error.~~

I would very much appreciate if somebody could shed some light on this issue.

This will need some extra documentation and tests once the above has been addressed.

MatthieuDartiailh commented 2 years ago

I actually figured the issue. I was not checking for the number of args in the invalid_arguments rule when we matched a generator expr. I will do my best to update the PR ASAP.

MatthieuDartiailh commented 2 years ago

I have one issue remaining on which I would appreciate some insight @pablogsal @lysnikolaou which relates to the rule real_number.

This rule is basically invoked when we have the following:

x = 1 - 1j
match x:
    case 1 - 1j:
        y = 0

In this case we do not have a simple real number which would be handled by signed_number, and we will use the complex_number rule which will call the real_number rule. (line 8573 of https://raw.githubusercontent.com/python/cpython/main/Parser/parser.c). My issue is that currently we check that the real part of the literal is a complex number (through _PyPegen_ensure_real which calls PyComplex_CheckExact) which to me should never be true since I expect the real part of the literal to be parsed as either an int or a float. However it does work in CPython so I must be missing something.

pablogsal commented 2 years ago

I have one issue remaining on which I would appreciate some insight @pablogsal @lysnikolaou which relates to the rule real_number.

Will try to investigate today or tomorrow if @lysnikolaou doesn't do it before. We are currently releasing 3.10.4 and 3.9.12 as we had some problems with the latest release and it may take a while.

lysnikolaou commented 2 years ago

I'm not sure what the intent of that part of the grammar is, and sadly, I wasn't around when this was introduced. What I can say though, is that NUMBER tokens and number expr_tys, as a result, contain the whole number, imaginary or real.

Even in the real_number rule, the NUMBER that gets parsed and will be passed to _PyPegen_ensure_real will be the complex number in its entirety, and that's why PyComplex_CheckExact returns true. You can check parsenumber_raw for how this is done.

Does that answer your question?

MatthieuDartiailh commented 2 years ago

Sorry for the noise.... I missed the absence of a leading ! in _PyPegen_ensure_real. I am not used enough to C. All good now.

MatthieuDartiailh commented 2 years ago

In case it was not clear from the status change. This is now ready for review. I would like to get it in before picking up again #41