lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.77k stars 404 forks source link

Lark alternatives on C++ #281

Closed evandrocoan closed 5 years ago

evandrocoan commented 5 years ago

When I need to parse something and I am programming with Python, I already know what to do, import lark and be happy. But when programming over seas, i.e., on C++, what is the closest parser match for a lark user?

If there are any good matches, is porting lark to C++ feasible? Or perhaps, instead of "porting it", allow lark to generate a standalone parser on C++.

erezsh commented 5 years ago

I do want to use Lark to generate a statically-compiled parser, and also provide Python bindings autmatically, so it can be used as a drop-in replacement to the Python implementation.

However, this is all in the planning stage so far.

I do think, however, that I'll probably use D, or Rust, or Julia, or anything but C++ :)

As far as C++ goes, I think Elkhound is the best parser there is, only technical terms. But when it comes to a nice interface... I'm sorry, I'm unaware of any good option.

evandrocoan commented 5 years ago

Thanks! I searched for Elkhound, it seems it last update was in 2006: http://scottmcpeak.com/elkhound/

May be it is too outdated?

The Elkhound parser sees to be a LR(K) parser, but for me, some LALR(1) parser would be enough. However I would like to enter with my grammar nicely as lark allows. For example:

start_symbol: /cool regex/ | ["matchme"]

Then, either I use flex/bison or translate lark to C++. I think it should be possible to write an equivalent version of lark completely in C++. But, before translating thousands of lines of code, I would check if it would be possible to import a Python library in C++.

erezsh commented 5 years ago

Sorry, I'm not aware of any C++ parser worth recommending. I think D has a nice parser, and it binds to C++ pretty easily. But I never tried.

It is possible to translate Lark to C++, of course. If you're seriously considering this hazardous route, I suggest you only implement the LALR(1) parser and lexer (lark/parsers/lalr.py, lark/lexer.py). All the necessary tables can be calculated in Lark. Actually, I'm planning soon to improve the internal interface between the grammar analyzer and the parser/lexer, so that it will be easier to create standalone parsers in any language.

evandrocoan commented 5 years ago

As you said, probably there is no good parser on C++, then, at least allow lark to generate a parser on C++ should be a great improvement for C++.

finjulhich commented 5 years ago

you could look at boost.spirit. It uses c++ operators to let you define grammars naturally. It generates at compilation the machinery needed to parse input streams and has associated semantic actions on match. Very performant, very complete, Try to use spirit xi which requires c++14

erezsh commented 5 years ago

Actually, I'm planning soon to improve the internal interface between the grammar analyzer and the parser/lexer

Now in master, Lark instances serialize to and from JSON