jamesrhester / Lerche.jl

A Julia port of the Lark parser
MIT License
45 stars 3 forks source link

union of parsers #23

Open guyvdbroeck opened 3 years ago

guyvdbroeck commented 3 years ago

Feature request: I have several Lerche parsers that read various file formats and output an object through a transformer for each parser. Now I'd like to have another parser that is the union of those parser, that just unions the grammars and essentially figures out from the file+grammar which format is given, and then runs the corresponding transformer to get the right parsed object.

jamesrhester commented 3 years ago

Sounds like a useful idea.

What do you think of the following notional interface:

u = LercheCombine((grammar1,grammar2,grammar3),(transformer1,transformer2,transformer3))
result = Lerche.parse(u,"my_random_file")

In terms of implementation it would be trying each grammar in order until it found one that didn't cause a parse error. It would only take a few lines of code to create the LercheCombine type and then a LercheCombine-specific dispatch for Lerche.parse that simply did a Lerche.parse for each of the grammars in the LercheCombine object inside a try/catch block. Feel free @guyvdbroeck to submit a pull request on those lines if you wish, as I might not get to this for a while.

Note I'm using Lerche as the prefix in LercheCombine to avoid polluting the Lark namespace in case something similar is added to Lark.

erezsh commented 3 years ago

I'd just like to point out that this is possible in Lark-Python using the %import statement.

It won't compile for every combination of grammars, but if it does, it will work as a single parser, which means better performance than trying them one at a time, and with better chances of successful parsing.

Here is an example showing how: https://github.com/lark-parser/lark/tree/master/examples/composition

jamesrhester commented 3 years ago

In which case there is a very good chance it will work like this in Lerche also. I'm assuming it will only fail if more than one grammar matches start and it can't be disambiguated. Is this sufficient for you @guyvdbroeck?

TODO: create a similar example for Lerche.

erezsh commented 3 years ago

It should be possible in Lerche.

Ambiguity is indeed the main worry here, but similar rule names are easy to solve: https://github.com/lark-parser/lark/blob/master/examples/composition/storage.lark

guyvdbroeck commented 3 years ago

Thanks all. I would also like to avoid having to copy the entire transformer and its methods (and avoid more precompilation overhead), which I suppose is a problem specific to Lerche, not Lark?

jamesrhester commented 3 years ago

I'm not sure why there would be any extra compiling in the LercheCombine outline I gave above. Each of the transformer methods is still a separate top-level function which would only be compiled the first time it is called with a particular set of argument types, and at the point it is called the argument types are exactly those it receives now. Or were you referring to the approach already available in Lark?

guyvdbroeck commented 3 years ago

Oh I like LercheCombine, my comment was more about the approach of importing the various grammars in the BNF itself.

erezsh commented 3 years ago

My impression is that most of the pre-compilation time is spent compiling the Julia code, rather than running it, and once that's accomplished, the size of the grammar doesn't matter that much. But I suppose one can never be sure until one benchmarks.