Yet another tool for language processing
This tool is intended for easy lexer and parser building, code transformations and anlysis.
Current features:
Future:
The function for defining lexers is
(deflexer <grammar-name>
(<rule-name> <rule-form> <options>...)
...)
Rules are defined using regex-like s-expressions based syntax:
(<elements>...)
- sequence, e.g. (#\a #\b #\c)
(:+ <elements>...)
(:+? <elements>...)
(:* <elements>...)
(:*? <elements>...)
(:or <elements>...)
(:? <elemtents>...)
(:r <form-char> <to-char>)
- like [..-..]
in regex.:any
- like .
in regex.Options:
:skip <bool>
- if <bool>
is t
the result of the rule will be omitted:fragment
- the corresponding rule is not an independent rule. Such rules are intended
to be referenced by other rules.Main entry point into lexer is
(lexer <stream> <grammar-keyword>)
Result is a lazy-list where each element is token of the following form:
(<type> <sym-representation> <line-number> <column-number>)
the last element is
(:EOF NIL <line-number> <column-number>)
Lexer for identifiers:
(deflexer identifiers
(letter (:r #\a #\z) :fragment)
(digit (:r #\0 #\9) :fragment)
(identifier (letter (:* (:or digit letter))))
(whitespace (:+ (:or #\Newline #\Space #\Return #\Vt #\Page #\Tab)) :skip t))
The function for defining parsers is
(defparser <grammar-name>
(<rule-name> <alternative>... [:options <options>...])
...)
Each alternative can be one of the following:
-> <element>...
- (simple-form) each element can be either rule-name or string.-> :^ <rule-name>
- (mimic-form).-> :eps
- (eps-form).-> * <delimiter> <rule-name>
- (star-form) delimiter is either string, rule-name or :eps
.-> + <delimiter> <rule-name>
- (plus-form) delimiter is either string, rule-name or :eps
.-> :lex <lexer-rule>
- (lexer-rule-form).When star-form, plus-form or lexer-rule-form is specified it must be the only alternative in the rule.
Result of parsing is an AST, which has the following form:
(<term-type> <children>...)
Each alternative will have its own <term-type>
in the following form: <rule-name>-<some-number>
.
Child is always either a term (for a rule-name in grammar) or a symbol (for lexer-rule-form). None of the
literals mentioned in rules are preserved in the resulting terms.
When mimic-form is present as one of alternative then the result of the rule in mimic form is return as-is without wrapping. For example using the following grammar
(defparser some-grammar
(rule1 -> "abc"
-> :^ rule2))
(rule2 -> "123"))
the result of parsing "123"
will be (rule2)
not (rule1 (rule2))
which would be the result
of the similar grammar without mimic rule.