no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
821 stars 65 forks source link

How can I ignore tokens? #81

Closed Ghabriel closed 6 years ago

Ghabriel commented 6 years ago

I'm using moo + nearley and I want to support comments (//... and / ... /) in arbitrary places without polluting my grammar. How can I achieve this?

tjvr commented 6 years ago

Moo doesn’t support this by design, but you can wrap the lexer yourself to do whatever you want.

Perhaps try replacing your Lexer’s next() method?

lexer.next = (next => {
let tok
while ((tok = next()) && toktype === ‘comment’) {}
return tok
})(lexer.next)

(Untested; I only just came up with this!)

Sent with GitHawk

Ghabriel commented 6 years ago

This works:

lexer.next = (next => () => {
    let tok;
    while ((tok = next.call(lexer)) && tok.type === "comment") {}
    return tok;
})(lexer.next);

However, things like cla/*comment*/ss become cla and ss instead of class. I don't think wrapping the lexer can solve this specific case without basically reimplementing the entire lexer, can it?

For most cases this version is enough though, so thanks for your help!

bd82 commented 6 years ago

Ignored tokens in most languages I am aware of are not normally treated as if they did not exist instead they are not passed to the parser and can still be used as "separators" imagine whitespace:

"public static class Foo extends Bar..."

Whitespace is ignored in Java, but this is not read a single huge identifier:

"publicstaticclassFooextendsBar"

Ghabriel commented 6 years ago

I initially thought that comments were in fact treated as if they didn't exist (I was aware about whitespaces, though), but I ran some tests and indeed every language I've tried showed a syntax error on things like cla/*comment*/ss. I'll close this issue, thanks everyone!

tjvr commented 6 years ago

This works

Thanks for correcting my example; I was typing on a phone.

I’m glad this works. I can’t imagine how you’d implement a lexer which worked the other way; and it would probably not be efficient! :-)

Sent with GitHawk