lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.81k stars 409 forks source link

Directly used literals not returned by transformer #1370

Closed Daniel63656 closed 10 months ago

Daniel63656 commented 10 months ago

Consider this example. When using literals directly like in grammar1, the transformer does not contain them in its children despite the parser having the same TerminalDef('SLASH', '/') in both grammars

from lark import Lark, Transformer

grammar1 ="""
    start: "bos" date "eos"
    date: DIGIT+ "/" DIGIT+ "/" DIGIT+
    DIGIT: /[0-9]/
"""
grammar2 ="""
    start: "bos" date "eos"
    date: DIGIT+ SLASH DIGIT+ SLASH DIGIT+
    DIGIT: /[0-9]/
    SLASH: "/"
"""

class MyTransformer(Transformer):
    def date(self, children):
        print("Callback for date:", children)
        return children

parser = Lark(grammar2, parser='lalr', transformer=MyTransformer())
tree = parser.parse("bos18/11/2023eos")

with grammar1 slash are missing: Callback for date: [Token('DIGIT', '1'), Token('DIGIT', '8'), Token('DIGIT', '1'), Token('DIGIT', '1'), Token('DIGIT', '2'), Token('DIGIT', '0'), Token('DIGIT', '2'), Token('DIGIT', '3')]

with grammar2 (explicitly defined as terminal) they appear: Callback for date: [Token('DIGIT', '1'), Token('DIGIT', '8'), Token('SLASH', '/'), Token('DIGIT', '1'), Token('DIGIT', '1'), Token('SLASH', '/'), Token('DIGIT', '2'), Token('DIGIT', '0'), Token('DIGIT', '2'), Token('DIGIT', '3')]

Is this behavior intended?

MegaIng commented 10 months ago

Yes, this is expected behavior. Read the documentation before opening issues please.