vlasovskikh / funcparserlib

Recursive descent parsing library for Python based on functional combinators
https://funcparserlib.pirx.ru
MIT License
338 stars 38 forks source link

Including source information as part of the parse tree #68

Closed lsp-ableton closed 2 years ago

lsp-ableton commented 3 years ago

Currently, the parser state is only available when an error occurs. It can also be useful to include source information as part of a parse tree. That is, when using rshift to generate a node in the tree, it would be helpful to also include the source file positions which were parsed to generate the node.

If I'm understanding the code correctly, it shouldn't be too difficult to include this information as part of parsing. We could use state before running a parser as the source-start value and the state after running the parser for source-end. I'd be happy to contribute this change, but I wonder if it's welcome.

Basically, this is all I'm talking about:

       @Parser
        def _shift(tokens, s):
            (v, s2) = self.run(tokens, s)
            try:
                return f(v, (s.pos, s2.pos)), s2
            except Exception:
                return f(v), s2

Maybe it should be implemented another way, but this at least produces the desired behaviour for me

vlasovskikh commented 2 years ago

@lsp-ableton Actually you can already track source information if you use funcparserlib.lexer.make_tokenizer() that generates an iterable of Token objects. You can also come up with your custom tokens that do something like that. It doesn't affect the parser itself since it happens at the lexer level.