ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
361 stars 24 forks source link

Suggestion: Make it so that `<$` rules return at most one result. #45

Closed ChristopherKing42 closed 3 years ago

ChristopherKing42 commented 5 years ago

In my grammars, sometimes I'll having something along the lines of many (whitespace) *> optional (stuff) <* many (whitespace). The issue then is that if I am parsing, say, 4 spaces, many identical Nothing results are returned, based on the different ways the whitespace on the left and right could be matched. One solution is to rewrite it as many (whitespace) *> optional (stuff <* many (whitespace)), but when you have complicated grammar its hard to coordinate this.

I had an idea for how to modify the library to accommodate this problem. Basically, a new kind of production rule that returns at most one result. Which result? Well, the idea is that it would only work on rules such that all their results are identical. That is, they would only work on rules of the form a <$ r. That way, it does not matter what result is returned. Here are some examples of equations they would follow:

Then, in our many (whitespace) *> optional (stuff) <* many (whitespace) example, many (whitespace) *> pure Nothing <* many (whitespace) would reduce to Nothing <$ (many (whitespace) *> pure () <* many (whitespace)). Since this is what the 4 spaces are matching against, at most one Nothing result would be returned.

I was thinking about digging into the code and creating a pull request, but before that I wanted to see if it was seen as worthwhile, or if I was missing something obvious.

ollef commented 5 years ago

An often used way to handle whitespace is to have all tokens be responsible for consuming the whitespace after, but not before them. Perhaps that could work here?

I don't feel good about changing the behaviour to of <$ to be anything other than fmap . const, but it might be possible to add another combinator to ignore the different ways something can be parsed, and I'd be open for that if it can be implemented sensibly and correctly.

ChristopherKing42 commented 5 years ago

Yeah, that would work!

Yeah, a different combinator would work too. I was just thinking that, if you treat the results as sets, it would still be equivalent to fmap . const. I guess it could be unexpected if they don't do exactly the same thing though.

ChristopherKing42 commented 4 years ago

I realized that trying to simplify before the parsing could lead to an exponential explosion, and trying to do so after is really complicated. Therefore, I do not think I will be making a pull request.