neogeny / TatSu

竜 TatSu generates Python parsers from grammars in a variation of EBNF
https://tatsu.readthedocs.io/
Other
403 stars 48 forks source link

Positive join cant ignore Whitespace #311

Closed coalooball closed 10 months ago

coalooball commented 11 months ago

Hi, I found that "Positive join" does not ignore whitespace characters when parsing text of list type. I carefully reviewed the official documentation, but did not find a further solution.

THE EBNF:

@@grammar::TheftUids

start
    =
    expression $
    ;

expression
    =
    | ~ ( DELIMITER ).{ DIGIT }+
    | ~ { DIGIT }+
    ;

DIGIT = /\d+/ ;

DELIMITER = /[^\w\s]/ ;

The generating command:

tatsu --outfile uids.py uids.ebnf

The testing Python snippet:

from uids import TheftUidsParser

p = TheftUidsParser()

res = p.parse("""27321334942597129, 27345249320501258,
27368210215665668,
27795585241972917,
27897393146691585,
27897393146691586,
27977030430359552,
28068272705044481,
28068272705044482,
28068272705044483,
28068272705044484,
28068272705044485,
28068272705044486,
28068272705044487,
28068272705044488
"""
)

The OUTPUT

tatsu.exceptions.FailedCut: (1:19) Expecting <DIGIT> :
27321334942597129, 27345249320501258,
                  ^
DIGIT
expression
start
coalooball commented 11 months ago

The code using DELIMITER = /[^\w\s]+/ ; has same error.

coalooball commented 11 months ago

I found an alternative solution:

DELIMITER = /[^\w\s]\s*/ ;
dnicolodi commented 11 months ago

Rule with names starting with a capital letter do not advance over white space before beginning to parse, thus the latter at least is expected behavior. See https://tatsu.readthedocs.io/en/stable/syntax.html#rules

coalooball commented 11 months ago

Rule with names starting with a capital letter do not advance over white space before beginning to parse, thus the latter at least is expected behavior. See https://tatsu.readthedocs.io/en/stable/syntax.html#rules

It works. I have changed all the rule names to lowercase, and it can now be parsed. Thank you very much!