lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.77k stars 404 forks source link

Inspect parse tree as it's being parsed, without transforming. #378

Closed elektito closed 5 years ago

elektito commented 5 years ago

I'm trying to use the transformer argument of the Lark class to have a visitor inspect everything that has been parsed so far, so that later this can help with the lexer (something similar to "the lexer hack" some C compilers use). I tried using an InPlace trasnformer and not-transforming anything, but it seems that the InPlace transformer is not accepted, or rather it is accepted but used as a base transformer. This is a very simple example:

grammar = r"""
program: idx+
idx: "xyz"i
%ignore /[ \t\f]+/
"""
prog="xyz xyz xyz"

@v_args(tree=True)
class T(Transformer_InPlace):
    def idx(self, t):
        print('visited:', t)

parser = Lark(grammar, start='program', parser='lalr', transformer=T())
t = parser.parse(prog)
print(t.pretty())

This will output:

visited: []
visited: []
visited: []
program
  []
  []
  []

This means that neither v_args has worked as it should, nor has Transformer_InPlace behaved as expected.

Is this the expected behavior? Because if it is, I don't think it's documented anywhere. Also, is there any other way of achieving this?

erezsh commented 5 years ago

When no transformer is provided, Lark creates a tree instance and then applies the restructure rules provided by the grammar.

When a transformer is provided, it's always your job to create the tree instances (or whatever object you want to return). Modifiers don't apply to the internal transformer, you will always have to accept the list of children, and return a new object.

For example, this code works:

from lark import Lark, v_args, Transformer, Tree
grammar = r"""
program: idx+
idx: "xyz"i
%ignore /[ \t\f]+/
"""
prog="xyz xyz xyz"

class T(Transformer):
    def idx(self, t):
        print('visited:', t)
        return Tree('idx', t)

parser = Lark(grammar, start='program', parser='lalr', transformer=T())
t = parser.parse(prog)
print(t.pretty()) 

P.S. The argument could be made that your use case is in fact correct. It is possible to account for v_args=Tree on the internal transformer, and create the tree automatically. I would accept a pull request for that.

elektito commented 5 years ago

Thanks for the explanation. I would like to work on that, and I will report back when/if I got anything. Meanwhile, I would appreciate if you could provide any pointers as to where I should look for.

So far, it looks to me that the callbacks are called in the reduce function in the lalr_parser.Parser class. I can then check for the whole_tree attribute and if it is there, I can assume that there will not be a return value and the input is modified in place. Is that the correct approach? Are the any other places that need to be modified?

erezsh commented 5 years ago

Here is the code responsible for creating callbacks: https://github.com/lark-parser/lark/blob/master/lark/parse_tree_builder.py#L229

It's pretty much just about adding an elif for testing whole_tree (set here by v_args https://github.com/lark-parser/lark/blob/master/lark/visitors.py#L261)

With any luck, the solution shouldn't be more than a couple of lines of code

elektito commented 5 years ago

I don't think that's all that is needed. What you suggest fixes the wrapper, but the values passed to the callback are still the children, while the whole tree should be passed to the function when the 'whole_tree' attribute is set.

erezsh commented 5 years ago

The way I see it (and I might be missing something), you need a decorator that accepts the children, creates a tree, and then passes it to the user function.

elektito commented 5 years ago

You're right, of course. Working on it!

elektito commented 5 years ago

I created the pull request. Do tell me if there's anything wrong with it, but it seems to work fine, and all tests pass like before, including the new one I added.

erezsh commented 5 years ago

It's great, I merged it in.

But, I had to make some changes. All transformers have to return a tree, even Transformer_InPlace. For a non-returning in-place visitor there is.. Visitor :)

But no worries, that's more my fault than yours.

elektito commented 5 years ago

All transformers have to return a tree, even Transformer_InPlace. For a non-returning in-place visitor there is.. Visitor :)

Doesn't that defeat the purpose though? An in-place transformer is supposed to update the tree, well, in-place! But here we're returning something that replaces the original, so it's not in-place anymore.

And, actually, I was looking for something like visitor, except I wanted to be able to pass it as an argument to Lark.__init__ so that I get to inspect the values as soon as they are parsed, not after the whole input has been parsed.

erezsh commented 5 years ago

An in-place transformer is supposed to update the tree, well, in-place!

That's exactly what it does. It replaces the original, but in-place, instead of always returning a new tree. That means that you can always return the same tree instance you were given, and so avoid any allocations, but you can also replace the existing tree with a new one (or any python object, for that matter). That's what a Transformer means.

In the end of it, the tree has to be constructed sometime during the parse, and so the transformer is where it happens, because it's a faster design. But it's also a bit of a hack on top of transformer, which means that Transformer_InPlace doesn't actually act any different inside it. It's exactly like a regular Transformer.

I get that you're looking for some observation mechanism, but you can just emulate that by returning a tree and observing it along the way.

elektito commented 5 years ago

I guess you're right, although it's a bit unintuitive. Anyways, I already fixed my code with the reconstruction code you first suggested and moved on. I believe I will be back with other questions soon, so I apologize for that beforehand! :)

erezsh commented 5 years ago

although it's a bit unintuitive

I guess what you're trying to do is a bit unusual, so it's not that bad. If someone comes up with a better design I'll be happy to consider it. But as for the current one, as they say, I had my reasons :)

As for questions, no worries. I always have time for contributors, and I'll be happy to help.