kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

Removing ambiquity by enforcing precedence order #627

Closed dvargas92495 closed 1 year ago

dvargas92495 commented 1 year ago

Hey @kach, somewhat straight forward question

I have a grammar where a nonterminal has many options:

token -> A
token -> B
...etc

Frequently, it is the case that an input text causes ambiguity between these options. Ideally a precendence order would be enforced where if i's ambiguous between A or B, it chooses A.

My workaround currently is in the post processor for B to detect when it also could satisfy A and then return a reject. This is fine for a one off, but there could be 10-20 options for this nonterminal and I'm finding myself basically reimplementing the parser in order to catch all of the reject cases.

Is there a way to enforce precedence order?

kach commented 1 year ago

Nearly doesn't have a built-in way to enforce precedence. But you can often rewrite your grammar to be unambiguous (see the calculator grammar for an example).

dvargas92495 commented 1 year ago

I'm using Nearley to parse different, app-specific flavors of Markdown, which is a notoriously ambiguous language, mostly because all inputs must be valid.

Classic example is accepting **a** as bolding, *a* as italics, and all other uses of * as plain text. I feel with a precedence order I'd be able to say that **a** is a bold, instead of italics within italics or 5 instances of free text.

Given this, is the best workaround still returning reject for unambiguity, or do you see a way to specify these cases unambiguously?

PoolloverNathan commented 1 year ago

From my experience, the first choice in an alternatives list will be returned first, so you could "fake" it by only using the first parse returned by nearley.

dvargas92495 commented 1 year ago

As in, set some sort of global variable during the first parse so that we could reject on subsequent parses? How does subsequent parses in an alternative list they are part of the same "decision"?

PoolloverNathan commented 1 year ago

Say you have the following:

main -> "class" | [clas]:+

and input this:

class

For me, nearley returns the following results, in order:

["class"]
["c", "l", "a", "s", "s"]

Therefore, I could choose to select just the first result, which parsed "class" as a token instead of characters. I assume that something similar can be done with Markdown.

dvargas92495 commented 1 year ago

Ah I see - so just always return the end result set. I think that should solve it for me, I'll close the issue! Thanks @PoolloverNathan!