sliekens / Txt

A text parsing framework for .NET.
MIT License
2 stars 4 forks source link

Order of alternatives #6

Open sliekens opened 9 years ago

sliekens commented 9 years ago

It was recently pointed out to me that ABNF alternatives as described in section 3.2 of RFC 5234 are unordered. https://tools.ietf.org/html/rfc5234#section-3.2

What this means is that you can't use a first-match-wins algorithm, because all alternatives have the same priority. But this is how the AlternativeLexer class is currently implemented.

What are some better ways to parse alternatives, other than first-match-wins?

sliekens commented 9 years ago

I added a class GreedyAlternativeLexer that implements the following strategy:

  1. Try to read each alternative in no particular order
  2. Back out of successful matches, but remember the best match (Greedy matching: the longest match is the best match)
  3. Return the best match, or a null reference if none of the alternatives match (When two matches are the same length, the first match wins)

There are now 2 strategies that you can use: first-match-wins, or greedy matching. Will that be enough? Or is there a real need for other matching algorithms? Lazy matching?