no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
821 stars 65 forks source link

Lexer.has does not find error token. #76

Closed JoshuaGrams closed 6 years ago

JoshuaGrams commented 6 years ago
var moo = require('moo');
var lexer = moo.compile({
    word: /\w+/,
    ws: { match: /\s+/, lineBreaks: true },
    somethingElse: moo.error
});
console.log('has word?', lexer.has('word'));
console.log('has ws?', lexer.has('ws'));
console.log('has somethingElse?', lexer.has('somethingElse'));

This means that you can't define an error token and use it with Nearley? AFAICT it calls Lexer.has on every token that you use. At any rate, if you're going to claim that you can define an error token instead of throwing an error, then it should behave just like any other token in all respects.

JoshuaGrams commented 6 years ago

Ah, shoot. I was expecting that an error token would have no contents and let you continue parsing, instead of taking the rest of the input. I'm trying to do a thing with indentation and markdown-style lists. So I thought I could lex with newlines pushing a line-marker state which would recognize whitespace as indentation, and then * or + or - would give list marker tokens which would pop the state, and an error would return an unmarked token and pop the state. Is there a better way to do this?

tjvr commented 6 years ago

Yes, Nearley uses Lexer.has to work out whether a %token is exposed by Moo, or a custom token matcher. You're right, has() should return true for error tokens.

I was expecting that an error token would have no contents and let you continue parsing, instead of taking the rest of the input

When none of your rules match, Moo doesn't know what to do. So you can either have it throw an error, or return an error token with the whole of the rest of the input. (I've updated the README to clarify this.)

I think error tokens are the wrong thing here. Generally tokenizers work best when your tokens are small atomic units: so I would separate your newline rule from your rule for leading whitespace, for example. You probably want something like Nathan's transformer to turn indentation into INDENT and DEDENT tokens.

EDIT: note that if you want this behaviour (error tokens having no contents), you can always implement it yourself on top of Moo's existing API. :-)