Lerche (German for Lark) is a partial port of the Lark grammar processor from Python to Julia. Lark grammars should work unchanged in Lerche.
Installation: at the Julia REPL, using Pkg; Pkg.add("Lerche")
See also 'Notes for Lark users' below.
Lerche reads Lark EBNF grammars to produce a parser. This parser, when
provided with text conforming to the grammar, produces a parse
tree. This tree can be visited and transformed using "rules". A rule is
a function named after the production whose arguments it should be called on, and
the first argument of a rule is an object which is a subtype of
Visitor
or Transformer
.
Given an EBNF grammar, it can be used to parse text into your data structure as follows:
Transformer
or Visitor
instances of which will be
passed as the first argument to the appropriate rule. The instance can also be used to
hold information during transformation if you wish, in which case it must have a concrete type.visit_tokens(t::MyNewType) = false
if you will not be processing token values. This
is about 25% faster than leaving the default true
.@rule
if the second argument
is an array containing all of the arguments to the grammar production@inline_rule
if the second
and following arguments refer to each argument in the grammar production@terminal
macro instead of @rule
.If your grammar is in String
variable mygrammar
, your text to be parsed and transformed
is in String
variable mytext
, and your Transformer
subtype is MyTransformer
, the
following commands will produce a data structure from the text:
using Lerche
p = Lark(mygrammar,parser="lalr",lexer="contextual") #create parser
t = Lerche.parse(p,mytext) #Create parse tree
x = Lerche.transform(MyTransformer(),t) #transform parse tree
For a real-world example of usage, see this file.
If you are publishing work where Lerche has been useful, please consider citing the Lerche paper.
Please raise any issues or problems with using Lerche in the Github issue tracker.
Contributions of all types are welcome. Examples include:
The most straightforward way to make a contribution is to fork the repository, make your changes, and create a pull request.
Please read the Lark documentation. When converting from Lark programs written in Python to Lerche programs written in Julia, the changes outlined below are necessary.
self
)@rule
macro. Inline
rules use the @inline_rule
macro and token processing methods use @terminal
. UnexpectedInput
exception must become e.g
an UnexpectedCharacter
exception if a message is included.PuppetParser
invoked when there is a parse error is not yet
functionalregex
engine, Tree
structure or byte/string
choices are available as they make no sense for Julia.Lerche is currently based off Lark 0.11.1. The priority has been on
maintaining fidelity with Lark. For example, global regex
flags
which are integers in Lark are still integers in Lerche, which means
you will need to look their values up. This may be changed to a more
Julian approach in future.
The @rule
and @inline_rule
macros define methods of Lerche function
transformer_func
. Julia multiple dispatch is used to select the
appropriate method at runtime. @terminal
similarly defines methods
of token_func
.
Parsing a large (500K) file suggest Lerche is about 3 times faster
than Lark with CPython for parsing. Parser generation is much slower as no
optimisation techniques have been applied (yet). Calculating and
storing your grammar in a Julia const
variable at the top level
of your package will allow it to be precompiled and thus avoid
grammar re-analysis each time your package is loaded.