lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.85k stars 411 forks source link

Tree Matcher/reconstructor incorrectly (un-)flattens children #927

Open MegaIng opened 3 years ago

MegaIng commented 3 years ago

Describe the bug

Sometimes (e.g. not reliable/random chance), the TreeMatcher (and therefore the Reconstructor applies trees like this:

expr
    literal "a"
    "5"
    "5"

as if they were

expr
    expr
        literal "a"
        "5"
    "5"

To Reproduce

parser = Lark.open_from_package("lark", "lark.lark", ("grammars",))

lark_reconstructor = Reconstructor(parser, {
    "_NL": lambda _: "\n",
    "_VBAR": lambda _: "|",
})

pattern = parser.parse("""
start: "a"~5..5
start: ("b"~5)~5
""")
print(pattern.pretty())
print(lark_reconstructor.reconstruct(pattern))

This script sometimes wrongly produces

start:("a"~5)~5
start:("b"~5)~5

(I came across this while working on ast_generator)


Maybe related: Sometimes, the tree_matcher outputs normally illegal constructs of the shape Tree('expansion',[Tree('expansion',[...])]) or Tree('<origin>', [Tree('<alias>',[...])]) This can be dealt with, but is kinda annoying. I don't have a simple example. Here is where I deal with it.

erezsh commented 3 years ago

Sounds like something that can be solved by changing priority / ordering of choices?

MegaIng commented 3 years ago

@erezsh I am not sure. I don't know exactly how tree_matcher is implemented.