kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.59k stars 232 forks source link

Does anyone have an example of matching regex? #595

Open gajus opened 2 years ago

gajus commented 2 years ago

I need to match regex like strings, such as /foo/i and distinguish them from regular unquoted strings.

KillyMXI commented 2 years ago

Unquoted strings are tricky.

If you want /foo/i to be matched by Nearley as a regex-like string and not unquoted string then your unquoted string grammar must contain something that will always fail on it. Like require that / must always be escaped for example. If that's fine then your unquoted string will be something like a sequence of characters, where each character is either \/ (or any escape sequence flavor you prefer) or anything but / (and other characters you want to escape). This requirement can be relaxed a bit: if something starts with / then it is regex-like, otherwise it is unquoted string. But first character can't be / so the provision for escape is still required and the grammar will be more complicated.

After you distinguish between regex-like and unquoted strings - unquoted strings might bite you somewhere else.

Nearley has no efficient means to discard alternatives when multiple interpretations are possible. Some discussion in #591

For a use case where I have to deal with unquoted strings and "take first successful match" logic works for alternative parsings - I just dropped Nearley and went with my own solution.

gajus commented 2 years ago

This appears to work fine:

regex ->
  regex_body regex_flags {% d => d.join('') %}

regex_body ->
    "/" regex_body_char:* "/" {% d => '/' + d[1].join('') + '/' %}

regex_body_char ->
    [^\\] {% id %}
  | "\\" [^\\] {% d => '\\' + d[1] %}

regex_flags ->
  null |
  [gmiyusd]:+ {% d => d[0].join('') %}