Closed deltaidea closed 7 years ago
moo.compile({
id: /\w+/,
ws: {match: /\s+/, lineBreaks: true},
// … rules rules rules …
ignore: /.+/, // skip to eol
})
If you meant the entire line gets ignored, not just the trailing lexically invalid part, that's a job for the parser, not the lexer, because there are almost always sequences of lexically valid tokens that are not syntactic. (For example, + -
is a sequence of JS tokens that is not syntactic.)
I tried that approach:
compile({
...
lCurly: '{',
rCurly: '}',
invalid: /.+/
})
rCurly
gets moved to the list of keywords matched by invalid
. Then the whole line } // comment
gets parsed as invalid
which doesn't match rCurly
.
I could do /[^{}]+/
but it gets very messy with negative lookaheads for tokens like #define
. It's easy to forget to add new token to invalid
regexp and hard to debug the consequences.
I'd obviously prefer a general solution upstream.
I'm willing to try and implement this in a PR if you guys think it's a good idea.
You made me think of a much simpler way to implement it:
let errorRe = /(?:(?!<re>).)+/my // <re> is `lexer.re`, i.e. all the valid stuff.
When can't parse a valid a token, errorRe.exec(input)
matches everything right up to the next one.
I still don't think that error recovery at the level of lexical analysis is what you want. Could you provide some examples from your language?
the whole line
} // comment
gets parsed asinvalid
which doesn't matchrCurly
.
Argh. I knew making keyword handling implicit would be a bad idea. Perhaps this is another reason to make keyword handling explicit (#53).
Here are some suggestions:
In my language, lines with invalid stuff are considered comments. I know, insane, but I'd like to support that if possible. Currently,
{ error: true }
is extremely greedy and considers everything from the first error to be a single token.I propose an optional error tolerance mode that can be enabled with
{ error: true, recover: true }
:line
andcol
of recovery starting point