we-like-parsers / pegen

PEG parser generator for Python
https://we-like-parsers.github.io/pegen/
MIT License
148 stars 31 forks source link

Strange syntax error only with zero '0' converted to int #88

Open philip-h-dye opened 1 year ago

philip-h-dye commented 1 year ago

With this extremely simple grammar "1" succeeds but "0" fails. Can anyone shed light on this issue? Thanks

zero-fails.py

from prettyprinter import pprint as pp

from pegen.grammar_parser import GeneratedParser as GrammarParser
from pegen.utils import generate_parser, parse_string

grammar_spec = """
    start  : a=num

    num    : a=NUMBER { float(a.string) if '.' in a.string else int(a.string) }
"""

numbers = [ 1, 0 ]

for n in numbers :
    grammar = parse_string(grammar_spec, GrammarParser)
    # print() ; pp(grammar)
    grammar_parser_class = generate_parser(grammar)
    tree = parse_string(str(n), grammar_parser_class, verbose=True)
    verdict = 'PASSED' if tree is not None else 'FAILED'
    pp([ verdict, tree ]) ; print()

Output

start() ... (looking at 1.0: NUMBER:'1')
  num() ... (looking at 1.0: NUMBER:'1')
    number() ... (looking at 1.0: NUMBER:'1')
    ... number() -> TokenInfo(type=2 (NUMBER), string='1', start=(1, 0), end=(1, 1), line='1')
  ... num() -> 1
... start() -> 1
['PASSED', 1]

start() ... (looking at 1.0: NUMBER:'0')
  num() ... (looking at 1.0: NUMBER:'0')
    number() ... (looking at 1.0: NUMBER:'0')
    ... number() -> TokenInfo(type=2 (NUMBER), string='0', start=(1, 0), end=(1, 1), line='0')
  ... num() -> 0
... start() -> None
Traceback (most recent call last):
  File "/home/phdyex/src/python/grammar-tool/test-files/pegen/gunit/simple/issue/zero-syntax-error/./zero-fails.py", line 20, in <module>
    tree = parse_string(str(n), grammar_parser_class, verbose=True)
  File "/home/phdyex/.cache/pypoetry/virtualenvs/grammar-tool-srSGyud3-py3.10/lib/python3.10/site-packages/pegen/utils.py", line 66, in parse_string
    return run_parser(file, parser_class, verbose=verbose)  # type: ignore # typeshed issue #3515
  File "/home/phdyex/.cache/pypoetry/virtualenvs/grammar-tool-srSGyud3-py3.10/lib/python3.10/site-packages/pegen/utils.py", line 55, in run_parser
    raise parser.make_syntax_error("invalid syntax")
  File "<unknown>", line 1
    0
    ^
SyntaxError: invalid syntax
0dminnimda commented 1 year ago

I suspect it is the same problem of testing the boolean value of the result and not actually checking if the node was matched. If you look at the generated code you probably will see something like if self.num() which will fail if the result is 0 as bool(0) == False It was not taken care of before, because all ast.X are classes and always evaluate to True

The other issue caused by this behavior is #81

82 and #84 are trying to fix this issue, but the maintainers are busy right now and unable to review/merge

In the meantime you can use values that don't evaluate to False, probably the best choice is to wrap the number in your class, i.e.

@dataclass
class NumberNode:
    value: int | float

And then create it like NumberNode(...) and use it like number.value

0dminnimda commented 1 year ago

Also, personally I want to ask you to check if #82 fixes this issue and report about it if possible

0dminnimda commented 1 year ago

@pablogsal if you have time, it would be nice of you to take a look at the PRs

philip-h-dye commented 1 year ago

For the moment, I edited the loop accepting num: was:

    @memoize
    def _loop1_1(self) -> Optional[Any]:
        # _loop1_1: num
        mark = self._mark()
        children = []
        while (
            (num := self.num())       # <== 0 and None both fail when 0 should succeed
        ):
            children.append(num)
            mark = self._mark()
        self._reset(mark)
        return children;

Fix _loop1_1 by adding is not None :

    @memoize
    def _loop1_1(self) -> Optional[Any]:
        # _loop1_1: num
        mark = self._mark()
        children = []
        while (
            (num := self.num()) is not None  # Now only None will fail and 0 will succeed
        ):
            children.append(num)
            mark = self._mark()
        self._reset(mark)
        return children;
philip-h-dye commented 1 year ago

Making a global "fix" in src/pegen/python_generator.py sadly results in 94 newly failing tests.

Essentially, my action coercing the strings to integers or floats is the fundamental cause if PEGEN requires that actions always result in a value that is semantically True. My apologies if this is documented somewhere. If not, I would suggest adding it.

Thanks