I've noticed some errors when using a terminal "production" rule of the form
T0: T1 | T2 | T3
where all of the given expressions are terminals. These errors only occur in the standalone parser generated by Lark.js; the same grammar will correctly parse an identical string in the python version of lark. I've isolated two hopefully-minimal-enough example cases below.
This seems to be similar to #21 in that it's related to some Javascript-specific regex foible that gets encountered when agglomerating terminals together via |, but as I'm not super-familiar with the internals of the library I can't be sure. As in #21, replacing VALUE with value everywhere (i.e. replacing the terminal rule with a non-terminal one) causes both of the following examples to parse correctly.
Example 1
This grammar:
?start: thing
thing: thing W thing
| expr
expr: label W? VALUE
| VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/
fails with UnexpectedToken when attempting to parse the string "a:b", although running it in the Python version of Lark results in a correct parse.
Example 2
This grammar:
?start: thing
thing: label VALUE | VALUE
label: BARE_WORD W? ":"
W: /[ \t\n\v\f]/+
VALUE: NUMBER | BARE_WORD | STRING
BARE_WORD: /[^\s:\(\)]/+
STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
NUMBER: /[0-9]+/
fails with SyntaxError: Invalid flags supplied to RegExp constructor 'nully' during lexing of the same string "a:b"; the Python version also correctly parses it.
I've noticed some errors when using a terminal "production" rule of the form
where all of the given expressions are terminals. These errors only occur in the standalone parser generated by Lark.js; the same grammar will correctly parse an identical string in the python version of lark. I've isolated two hopefully-minimal-enough example cases below.
This seems to be similar to #21 in that it's related to some Javascript-specific regex foible that gets encountered when agglomerating terminals together via
|
, but as I'm not super-familiar with the internals of the library I can't be sure. As in #21, replacingVALUE
withvalue
everywhere (i.e. replacing the terminal rule with a non-terminal one) causes both of the following examples to parse correctly.Example 1
This grammar:
fails with
UnexpectedToken
when attempting to parse the string"a:b"
, although running it in the Python version of Lark results in a correct parse.Example 2
This grammar:
fails with
SyntaxError: Invalid flags supplied to RegExp constructor 'nully'
during lexing of the same string"a:b"
; the Python version also correctly parses it.