no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
814 stars 65 forks source link

Can moo be used to identify RegExp tokens? #168

Closed gajus closed 2 years ago

gajus commented 2 years ago

This is the issue that I am trying to solve:

https://github.com/kach/nearley/issues/595

I want to match all regular expressions.

So far I have this:

regex ->
  regex_body regex_flags {% d => d.join('') %}

regex_body ->
    "/" regex_body_char:* "/" {% d => '/' + d[1].join('') + '/' %}

regex_body_char ->
    [^\\] {% id %}
  | "\\" strescape {% d => JSON.parse("\""+d.join("")+"\"") %}

regex_flags ->
  null |
  [gmiyusd]:+ {% d => d[0].join('') %}

strescape -> ["\\/bfnrt] {% id %}
    | "u" [a-fA-F0-9] [a-fA-F0-9] [a-fA-F0-9] [a-fA-F0-9] {%
    function(d) {
        return d.join("");
    }
%}

but it fails with /\s/. I wonder if moo can help here?

gajus commented 2 years ago

Looks like this works just fine:

regex_body_char ->
    [^\\] {% id %}
  | "\\" [^\\] {% d => '\\' + d[1] %}