Cannot change default comment regexp

See also #249. Summary: Either #249 was not actually fixed, or the documentation on how to specify comment regexps (docs/syntax.rst) is incorrect. Note also that the workaround in #249 still fixes this issue.

Tested using pip install tatsu (v 5.8.3) and Python 3.10.6.

Test grammar (comments.peg)

file::File = lines:{line}+ $ ;

line::Line = comment:comment | comment2:comment2 | blank:blank ;

comment::Comment = content:COMMENT ;
comment2::Comment2 = content:COMMENT2 ;

blank::Blank = content:NEWLINE ;

NEWLINE = '\n' ;
COMMENT = /#[^\n]*\n/ ;
COMMENT2 = /%[^\n]*\n/ ;

Main:

import tatsu
from tatsu.model import ModelBuilderSemantics
import json

def main():
    with open('comments.peg') as f:
        txt = f.read()
    parser = tatsu.compile(txt, semantics=ModelBuilderSemantics(), comments_re=None, eol_comments_re=None)

    with open('test.peg') as f:
        txt = f.read()
    model = parser.parse(txt, whitespace='', comments_re=None, eol_comments_re=None)

    print(json.dumps(model.asjson(), indent=4))

if __name__ == "__main__":
    main()

Test file (comments.peg):

# comment here

% different comment

# another comment

Resulting output:

{
    "__class__": "File",
    "lines": [
        {
            "__class__": "Line",
            "blank": {
                "__class__": "Blank",
                "content": "\n"
            }
        },
        {
            "__class__": "Line",
            "blank": {
                "__class__": "Blank",
                "content": "\n"
            }
        },
        {
            "__class__": "Line",
            "comment2": {
                "__class__": "Comment2",
                "content": "% different comment\n"
            }
        },
        {
            "__class__": "Line",
            "blank": {
                "__class__": "Blank",
                "content": "\n"
            }
        },
        {
            "__class__": "Line",
            "blank": {
                "__class__": "Blank",
                "content": "\n"
            }
        }
    ]
}

Expected output:


{
    "__class__": "File",
    "lines": [
        {
            "__class__": "Line",
            "comment": {                           <<<<<------------
                "__class__": "Comment",
                "content": "# comment here\n"
            }
        },
        {
            "__class__": "Line",
            "blank": {
                "__class__": "Blank",
                "content": "\n"
            }
        },
        {
            "__class__": "Line",
            "comment2": {
                "__class__": "Comment2",
                "content": "% different comment\n"
            }
        },
        {
            "__class__": "Line",
            "blank": {
                "__class__": "Blank",
                "content": "\n"
            }
        },
        {
            "__class__": "Line",
            "comment": {                           <<<<<------------
                "__class__": "Comment",
                "content": "# another comment\n"
            }
        }
    ]
}```

neogeny / TatSu

Cannot change default comment regexp #293