lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.89k stars 414 forks source link

Standalone visitor raises an exception #440

Closed ForceBru closed 5 years ago

ForceBru commented 5 years ago

Or, "Standalone Tree doesn't implement enough methods".

I'm trying to convert a LALR parser to standalone and then visit the tree using the Tree included in the standalone version. I'm using the latest Lark from master: https://github.com/lark-parser/lark/commit/b6b95c3ff01896a45b7835a7375203969a8040e3

I know the "standalonizer" script is supposed to be run from the command line. However, I don't really have access to a terminal since I'm running all this in Pythonista for iOS (yes, there's StaSh, but that's not really the point), so don't be surprised with the below code :D

import importlib

GRAMMAR_FILE = 'GRAMMAR.txt'
PARSER_MODULE = 'lark_parser'
START_SYMBOL = 'start'

def invalidate_module():
    # use that when the grammar changes to force regeneration of the standalone file
    import sys
    import os

    os.unlink(PARSER_MODULE + '.py')
    del sys.modules[PARSER_MODULE]

try:
    # attempt to load module
    parser_module = importlib.import_module(PARSER_MODULE)
except ImportError:
    # the module doesn't exist, so create it
    import io as io_tmp
    import unittest.mock as mock_tmp

    import lark.tools.standalone as standalone_tmp

    # Lark outputs to `sys.stdout` by `print`ing, and there doesn't seem to be any other way of redirecting output, so I'm using `unittest.mock.patch` 
    @mock_tmp.patch('sys.stdout', new_callable=io_tmp.StringIO)
    def serialize(parser_code):
        with open(GRAMMAR_FILE) as grammar:
            # generate the Python code
            standalone_tmp.main(grammar, START_SYMBOL)

        # return the Python code    
        return parser_code.getvalue()

    with open(PARSER_MODULE + '.py', 'w') as module:
        # write the serialized parser to the file
        module.write(serialize())

    # don't pollute the namespace  
    del io_tmp, mock_tmp, standalone_tmp, serialize

    # import the module
    parser_module = importlib.import_module(PARSER_MODULE)

# inherit from the standalone (!) `Visitor`  
class Analyzer(parser_module.Visitor):
    @parser_module.v_args('inline')
    def atom(self, *stuff):
        # random code
        print(f'Atom: {stuff}')

# instantiate the standalone parser    
parser = parser_module.Lark_StandAlone()

code = '5555555'

# this parses fine
root = parser.parse(code)
print(root.pretty())

# this RAISES AN EXCEPTION
analyzed = Analyzer().visit(root)

The grammar is simple:

# GRAMMAR.txt
start: atom*
atom: FIVE

FIVE: "5"

Output and traceback:

start
  atom  5
  atom  5
  atom  5
  atom  5
  atom  5
  atom  5
  atom  5

Traceback (most recent call last):
  File "/private/var/mobile/.../some_path/parse.py", line 49, in <module>
    analyzed = Analyzer().visit(root)
  File "/private/var/mobile/.../some_parh/lark_parser.py", line 459, in visit
    for subtree in tree.iter_subtrees():
AttributeError: 'Tree' object has no attribute 'iter_subtrees'

Indeed, the standalone Tree class doesn't have this method. Moreover, it doesn't seem to have many of the methods of lark.Tree:

>>> help(parser_module.Tree)
Help on class Tree in module lark_parser:

class Tree(builtins.object)
 |  Methods defined here:
 |  
 |  __eq__(self, other)
 |      Return self==value.
 |  
 |  __hash__(self)
 |      Return hash(self).
 |  
 |  __init__(self, data, children, meta=None)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __ne__(self, other)
 |      Return self!=value.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  pretty(self, indent_str='  ')
 |  # where's `iter_subtrees`?
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  meta

Unfortunately, this doesn't allow me to visit a tree using the standalone version of the parser.

ForceBru commented 5 years ago

Can't edit the original post since I'm using a really old browser, but I wanted to add that subclassing from parser_module.Interpreter solves this issue, so now I'm unsure if it's a bug or Visitor isn't meant to be used like this, although the docs say it is...

MegaIng commented 5 years ago

A temporary fix would be to move the weird ###} comment in tree.py, line 59 to somewhere further down (outside a function, also at the start of the line, maybe at line 124. (Directly before the # XXX Deprecated comment).

This tells the standalone program to include the iter_subtrees function.

erezsh commented 5 years ago

A temporary fix would be to move the weird ###}

It's not weird, it's ingenious! Learn to use the correct adjectives ;)

But yes, iter_subtrees isn't included, and moving the comment MegIng pointed out would solve it.

But this isn't a bug, it was a conscious decision (agree with it or not). After all, there's no point in making every standalone bigger, for features that are used very rarely.

MegaIng commented 5 years ago

@erezsh Yes, but then why is the Visitor class included?

erezsh commented 5 years ago

That's a good point! Also, it's not a very long method

erezsh commented 5 years ago

Decided to include iter_subtrees, along with find_data and find_predicate.