Closed JoshuaGrams closed 6 years ago
Ah, shoot. I was expecting that an error token would have no contents and let you continue parsing, instead of taking the rest of the input. I'm trying to do a thing with indentation and markdown-style lists. So I thought I could lex with newlines pushing a line-marker state which would recognize whitespace as indentation, and then *
or +
or -
would give list marker tokens which would pop the state, and an error would return an unmarked
token and pop the state. Is there a better way to do this?
Yes, Nearley uses Lexer.has
to work out whether a %token
is exposed by Moo, or a custom token matcher. You're right, has()
should return true
for error tokens.
I was expecting that an error token would have no contents and let you continue parsing, instead of taking the rest of the input
When none of your rules match, Moo doesn't know what to do. So you can either have it throw an error, or return an error token with the whole of the rest of the input. (I've updated the README to clarify this.)
I think error tokens are the wrong thing here. Generally tokenizers work best when your tokens are small atomic units: so I would separate your newline rule from your rule for leading whitespace, for example. You probably want something like Nathan's transformer to turn indentation into INDENT and DEDENT tokens.
EDIT: note that if you want this behaviour (error tokens having no contents), you can always implement it yourself on top of Moo's existing API. :-)
This means that you can't define an error token and use it with Nearley? AFAICT it calls
Lexer.has
on every token that you use. At any rate, if you're going to claim that you can define an error token instead of throwing an error, then it should behave just like any other token in all respects.