neogeny / TatSu

竜 TatSu generates Python parsers from grammars in a variation of EBNF
https://tatsu.readthedocs.io/
Other
408 stars 48 forks source link

Naming nested parses causes inside nodes to be duplicated in the AST #261

Closed pragdave closed 2 years ago

pragdave commented 2 years ago

Given the following:

GRAMMAR = r"""
grammar = all: ( one: "a" two: "b" );
"""

    parser = tatsu.compile(GRAMMAR)
    ast = parser.parse("a b", semantics=ModelBuilderSemantics(), trace=False, colorize=True) 
    print('PPRINT')
    pprint.pprint(ast, indent=2, width=20)
    print()

    print('JSON')
    print(json.dumps(asjson(ast), indent=2))
    print()

I get the output:

PPRINT
{'one': 'a', 'two': 'b', 'all': ['a', 'b']}

JSON
{
  "one": "a",
  "two": "b",
  "all": [
    "a",
    "b"
  ]
}

I was expecting:

PPRINT
{'all': [{'one': 'a'}, {'two': 'b'}]}

I'm probably misunderstanding something, but this doesn't seem right.

Dave

apalala commented 2 years ago

This is the expected and documented behavior.

You may want to rewrite your grammar as something like:

grammar = all: (one two) ;

one = one: "a"; 

two = two:"b" ;
apalala commented 2 years ago

You may reopen with a pull request that provides the desired behavior without breaking existing unit tests.

I do think that the behavior you seek is reasonable, but I think it would break automatic model generation.

The task is in part to define how the desidred JSON output should map to Python @dataclass objects.