kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

Understanding ambiguity: possible bug with "-" character? #624

Open springcomp opened 1 year ago

springcomp commented 1 year ago

I’m having a hard time figuring out what is and is not ambiguous in a grammar.

For instance, in the grammar below, the "-" (U+002D HYPHEN MINUS) character alone in the unsescaped_char rule makes the grammar ambiguous and the parser return three results.

identifier -> quote unescaped_char:+ quote
unescaped_char -> digit | letter |  "-"
digit -> [0-9]
letter -> [A-Z] |  [a-z] |  "_"
quote -> "\""

Remove the "-" and the grammar becomes unambiguous. 🤔

This can be seen in this BNF Playground example:

<identifier> ::= <quote> <unescaped_char>+ <quote>
<unescaped_char> ::= <digit> | <letter>
<digit> ::= [0-9]
<letter> ::= [A-Z] | [a-z] | "_"
<quote> ::= "\""

This parses as a unambiguous.

Now, lets add a simple "-" (U+002D HYPHEN MINUS) character to the unescaped_char rule.

…
<unescaped_char> ::= <digit> | <letter> | "-"
…

Now the grammar is ambiguous and the parser returns three results! Am I missing something ?