lark-parser / lark_cython

Cython plugin for Lark, reimplementing the LALR parser & lexer for better performance
MIT License
44 stars 5 forks source link

Meta is always empty while using cython lark #9

Closed ernestoarbitrio closed 1 year ago

ernestoarbitrio commented 1 year ago

This is my parser:

    lark_kwargs = {
        "parser": "lalr",
        "lexer": "contextual",
        "propagate_positions": True,
        "_plugins": lark_cython.plugins,
    }
    lang_parser = Lark(lang_def, start="start", **lark_kwargs)

I'd like to propagate the line positions, without cython-lark it works fine, but once i enable the plugins the meta value is always empty.

Do you have any suggestion on what's wrong ?

erezsh commented 1 year ago

Bug confirmed.

This is happening because lark-cython still uses Lark's parse-tree-builder, which in turn uses isinstance(..., Token), which only matches Lark Tokens, instead of lark-cython Tokens.

See this line: https://github.com/lark-parser/lark/blob/master/lark/parse_tree_builder.py#L74

erezsh commented 1 year ago

The following change works:

            elif hasattr(c, 'line'): #isinstance(c, Token):
                return c

But that would break backwards compatibility.

And unfortunately lark-cython cannot inherit from lark.Token.

So I'm not sure what's the best way to solve this. Another option is to update lark-cython to use its own parse-tree-builder.

ernestoarbitrio commented 1 year ago

Thanks for letting me know! Sorry but i do not understand what do you mean for

Another option is to update lark-cython to use its own parse-tree-builder.

Could you explain me?

erezsh commented 1 year ago

Right now lark-cython uses Lark's regular ParseTreeBuilder class. If that class is copied and adapted into lark-cython, that won't be a problem anymore. (and if done correctly, should be a little faster too)

ernestoarbitrio commented 1 year ago

oh gotcha ... do you wanna me try to do this?

erezsh commented 1 year ago

It's not so simple, it also requires changes in the Lark codebase, to allow it as a "plugin" argument.

Probably the easiest solution would be to allow a new optional attribute to mark an object as a 'meta', like __lark_meta__ = True. Then it only requires minimal changes to Lark, and only to the PropagatePositions class.

You might need to add this attribute to both Token and Tree.

ernestoarbitrio commented 1 year ago

yes I do not have enough experience on lark. Even the last solution probably requires more knowledge. I would be happy to help but I think I need help as well :D

erezsh commented 1 year ago

Okay, so I just created a PR for Lark, have a look: https://github.com/lark-parser/lark/pull/1203

With this change, all you need is to add this to lark_cython.Token

    def __lark_meta__(self):
        return self

And for Tree return self.meta.

Then add some tests to lark-cython, and that's it. Makes sense?

ernestoarbitrio commented 1 year ago

Okay, so I just created a PR for Lark, have a look: lark-parser/lark#1203

With this change, all you need is to add this to lark_cython.Token

    def __lark_meta__(self):
        return self

And for Tree return self.meta.

Then add some tests to lark-cython, and that's it. Makes sense?

yes I'll try. thx