we-like-parsers / pegen

PEG parser generator for Python
https://we-like-parsers.github.io/pegen/
MIT License
155 stars 33 forks source link

Add a special UNREACHABLE action for rule that never execute their action #18

Closed MatthieuDartiailh closed 3 years ago

MatthieuDartiailh commented 3 years ago

The main motivation for this is to get accurate coverage reports for rules used to generate custom syntax errors

pablogsal commented 3 years ago

@MatthieuDartiailh I think I am missing an example on how this is supposed to work to properly evaluate the proposed solution.

MatthieuDartiailh commented 3 years ago

Consider this rule from the python grammar:

del_stmt[ast.Delete]:
    | 'del' a=del_targets &(';' | NEWLINE) { ast.Delete(targets=a, LOCATIONS) }
    | invalid_del_stmt { UNREACHABLE }

The invalid_del_stmt will never return a non-None production meaning that the action will never be executed. This is fine except if one wants to get accurate coverage. Typically the solution is to have the action be None # type: ignore. However one cannot use comment in an action so we need some indirection which the UNREACHABLE special action provides.

pablogsal commented 3 years ago

The invalid_del_stmt will never return a non-None production meaning that the action will never be executed. This is fine except if one wants to get accurate coverage. Typically the solution is to have the action be None # type: ignore. However one cannot use comment in an action so we need some indirection which the UNREACHABLE special action provides.

Could we just inject the None # type: ignore in all invalid_* rules that don't return anything? Having to add UNREACHABLE can be a bit verbose and is another magical constant. What do you think?

MatthieuDartiailh commented 3 years ago

Actually the used of && in invalid rules is another case see:

invalid_with_stmt[None]:
    | ['async'] 'with' ','.(expression ['as' star_target])+ &&':' { UNREACHABLE }
    | ['async'] 'with' '(' ','.(expressions ['as' star_target])+ ','? ')' &&':' { UNREACHABLE }

I agree it is verbose but I do not mind being explicit about it and I am not sure what is the most easy to discover, a special action or an automatic action based on name. I do not have a strong opinion either way I just want a way to get the coverage right.

pablogsal commented 3 years ago

I agree it is verbose but I do not mind being explicit about it and I am not sure what is the most easy to discover,

Right, but my suggestion was done because this is not an operational behaviour, but just a development one: we want to properly check coverage. Adding things in the grammar that don't result in semantic behaviour raises a lot the noise IMHO

On the other hand, if doing it automatically is too hard, I am fine with this if @lysnikolaou is

MatthieuDartiailh commented 3 years ago

I do not think doing it automatically would be very hard but it would not allow to catch all cases as with the rules inside the invalid_with_stmt . However this is a corner case and those could be re-written as you mentioned at one point.

MatthieuDartiailh commented 3 years ago

Do you have a preference between merge and rebase for handling the conflicts ?

pablogsal commented 3 years ago

Do you have a preference between merge and rebase for handling the conflicts ?

Whatever is easier for you. I normally rebase, but we can squash at the end so is not a problem

MatthieuDartiailh commented 3 years ago

@pablogsal I picked a middle ground and made the addition of UNREACHABLE automatic for alts with no action of rules starting with invalid. That way it is still possible to mark custom rules but it does not clutter the grammar file in the most common case. WDYT ?

MatthieuDartiailh commented 3 years ago

ping @pablogsal @lysnikolaou

pablogsal commented 3 years ago

Thanks for the work and the wait, @MatthieuDartiailh