no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
814 stars 65 forks source link

Does moo support all regex syntax? #153

Open ryanking1809 opened 3 years ago

ryanking1809 commented 3 years ago

I'm just playing around with https://ablingeroscar.github.io/moo-playground/ and I don't understand why replacing WS: /[ \t]+/ with WS: /[\s]+/ doesn't work. Shouldn't that pick up any white space?

Similarly, I'm not sure why String: /[^]+/ (I'm just testing matching everything) doesn't work either.

Am I missing something?

gnbl commented 5 months ago

Short version: the RegEx character class \s for whitespace is not supported, because it includes newlines, and Moo uses multiline RegExps.

I, too, had the issue with \s not working online at https://omrelli.ug/nearley-playground/ (which supports moo).

So I tried moo.compile({whitespace: /\s/}); locally with Nodejs, which gives: Uncaught Error: Rule should declare lineBreaks: /(?:(?:\s))/.

This comes from https://github.com/no-context/moo/blob/main/moo.js#L274.

Finally, https://github.com/no-context/moo#on-regular-expressions states:

Moo uses multiline RegExps. This has a few quirks: for example, the dot /./ doesn't include newlines. Use [^] instead if you want to match newlines too.

Since an excluding character ranges like /[^ ]/ (which matches anything but a space) will include newlines, you have to be careful not to include them by accident! In particular, the whitespace metacharacter \s includes newlines.