lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.88k stars 414 forks source link

Change tree by condition #1164

Closed x23n5902y closed 2 years ago

x23n5902y commented 2 years ago

What is your question?

Hello. I'm making a parser that works with an INI-like config file. I also want to change the value of a parameter in a specific section. I don't understand how to conditionally change the tree using the Visitor, Transformer or Interpreter classes. I would appreciate any help.

If you're having trouble with your code or grammar

from lark import Lark, Tree, lark
from lark.reconstruct import Reconstructor
from lark.visitors import Visitor

text = """
SErvername              fs_tsm
COMMmethod              TCPip
TCPServeraddress        localhost
managedservices         schedule
tcpclientp              1501
nodename                LALALALALALALA
passwordaccess          generate
PASSWORDDIR             /etc/adsm/fs
errorlogname            /opt/tivoli/tsm/client/ba/bin/dsmerror.log
schedlogname            /opt/tivoli/tsm/client/ba/bin/dsmsched.log
errorlogretention       7,D
schedlogretention       7,D
* exclude.dir           /u0/oradata/test
* PRESchedulecmd        "/etc/pretsm.sh"
* POSTSchedulecmd       "/etc/posttsm.sh"

Servername              pg_tsm
TCPSERVERADDRESS        localhost
PASSWORDACCESS          generate
TCPCLIENTP              1506
NODENAME                FOOBARFOOBAR
PASSWORDDIR             /etc/adsm/pg
errorlogname            /opt/tivoli/tsm/client/ba/bin/dsmerror_pg.log
schedlogname            /opt/tivoli/tsm/client/ba/bin/dsmsched_pg.log
errorlogretention       7,D
schedlogretention       7,D
"""
grammar = r"""
        SECTION: /[sS][eE][rR][vV][eE][rR][nN][aA][mM][eE]/
        KEY: /(?![sS][eE][rR][vV][eE][rR][nN][aA][mM][eE])([a-zA-Z0-9\-\_\.])/+
        VALUE: /[\"a-zA-Z0-9\-\_\.,\/]/+
        COMMENT: /\*/+ WS_INLINE?
        SECTION_NAME: VALUE
        %import common.WS_INLINE
        %import common.NEWLINE

        config:            section
        section:           SECTION WS_INLINE section_name
        section_name:      SECTION_NAME NEWLINE statements*
        statements:        NEWLINE? COMMENT? KEY? WS_INLINE VALUE NEWLINE*
        empty_line:        NEWLINE

        start: ( config | empty_line )*
"""
parser = Lark(grammar, parser='lalr', maybe_placeholders=False, propagate_positions=False)
tree = parser.parse(text)
new_text = Reconstructor(parser).reconstruct(tree)
print(text == new_text)

If I add the Transformer class to my code, then it changes all branches that correspond to the terminal.

class Update(Transformer_InPlaceRecursive):
    def KEY(self, tok):
        return tok.update(value=tok + '_meow')

    def VALUE(self, tok):
        return tok.update(value=tok + '_caw')

    def SECTION(self, tok):
        return tok.update(value=tok + '_moo')

    def SECTION_NAME(self, tok):
        return tok.update(value=tok + '_quack')`

Update().transform(tree)
new_text = Reconstructor(parser).reconstruct(tree)
print(new_text)

The result is

SErvername_moo              fs_tsm_quack
COMMmethod_meow              TCPip_caw
TCPServeraddress_meow        localhost_caw
managedservices_meow         schedule_caw
tcpclientp_meow              1501_caw
nodename_meow                LALALALALALALA_caw
passwordaccess_meow          generate_caw
PASSWORDDIR_meow             /etc/adsm/fs_caw
errorlogname_meow            /opt/tivoli/tsm/client/ba/bin/dsmerror.log_caw
schedlogname_meow            /opt/tivoli/tsm/client/ba/bin/dsmsched.log_caw
errorlogretention_meow       7,D_caw
schedlogretention_meow       7,D_caw
* exclude.dir_meow           /u0/oradata/test_caw
* PRESchedulecmd_meow        "/etc/pretsm.sh"_caw
* POSTSchedulecmd_meow       "/etc/posttsm.sh"_caw

Servername_moo              pg_tsm_quack
TCPSERVERADDRESS_meow        localhost_caw
PASSWORDACCESS_meow          generate_caw
TCPCLIENTP_meow              1506_caw
NODENAME_meow                FOOBARFOOBAR_caw
PASSWORDDIR_meow             /etc/adsm/pg_caw
errorlogname_meow            /opt/tivoli/tsm/client/ba/bin/dsmerror_pg.log_caw
schedlogname_meow            /opt/tivoli/tsm/client/ba/bin/dsmsched_pg.log_caw
errorlogretention_meow       7,D_caw
schedlogretention_meow       7,D_caw

I would like to understand how you can make a condition for replacing text

erezsh commented 2 years ago

I think the best way is to create a transformer with v_args(tree=True), like this:

@v_args(tree=True)
class T(Transformer):
    def my_rule(self, tree):
         return tree if random() > 0.5 else "removed"

(not tested but should work)

x23n5902y commented 2 years ago

thanks a lot. I achieved my point by converting a tree to a python dict(), then parsing it again by json grammar and then reconstructing back

erezsh commented 2 years ago

What did you think of the reconstructor?

It's still written in the docs as an experimental feature, but I wonder maybe it's stable enough by now.

x23n5902y commented 2 years ago

I had to change the grammar a bit because the reconstructor didn't accept regular expressions in TERMINALS. In general, I did not get any problems. The text after the reconstructor is completely equal to the original text

erezsh commented 2 years ago

Nice to hear!

Yes, it could be improved. Maybe always resolve regexps to their shortest match. But that doesn't sound like a simple implementation.