Closed johnchandlerburnham closed 4 years ago
Here is an example how to do it:
p :: Parser (String, String)
p = (,)
<$> sepBy (char 'a') (try (char 's' <* notFollowedBy (char 'o')))
<*> string "something"
I believe this is called "non-greedy", not "non-eager".
I thought "greedy" and "eager" were synonyms, but I may be mistaken.
Using notFollowedBy
is fine when you know the parser that follows, but I'm not sure it's general if you have:
p :: Parser a -> Parser (String, a)
p q = (,)
<$> sepBy (char 'a') (try (char 's' <* notFollowedBy undefined))
<*> q
My specific use case for non-greedy sepBy'
was similar to this. I'm using Megaparsec to write the Haskell implementation of the Formality proof language, and ran into an issue on the case
expression parser: https://github.com/moonad/Formality-Haskell/blob/353383df1202c7467f3842663fc6eacf3ee5c79f/src/Parser/Lang.hs#L311
case x
| true => 1
| false => 2
: Number
Essentially, I had to have my space consumer parser as the separator between different case variants (e.g. | true => 1
), but not consume any space after the last variant if the type annotation is omitted. Would be infeasible to use notFollowedBy
, since the next parser could be many many different things depending on context.
Anyway, I already solved my issue using the non-greedy sepBy1'
, but I thought I'd offer to upstream a PR here since it seemed like something other people have also been confused by. No problem whatsoever if that's not of interest, big fan of your library!
It is not general, you need to tune it according to the specific grammar you want to parse.
The problem is that non-greedy versions require try
and all the combinators like this come from the parser-combinators
package which works with any Applicative
/ Alternative
/ Monad
instances and doesn't know anything about try
(which is often a good thing because the package can be used with other parsing libraries and it is easier to reason about combinators that cannot use try
).
Your idea is not unreasonable, it's just there is an endless amount of combinators that we can have and I need to decide what to include and what not to include and where to draw the line.
Okay, not wanting to have the combinators depend on try
makes sense to me. Thanks!
Here is an example how to do it:
p :: Parser (String, String) p = (,) <$> sepBy (char 'a') (try (char 's' <* notFollowedBy (char 'o'))) <*> string "something"
Here notFollowedBy (char 'o')
is needed to not match on the second letter of "something"
. In case you want to parse something else than "something"
, perhaps a choice <|>
of things, than your notFollowedBy
parser also becomes more complicated. Is there anything at all that can be done not having to figure out the notFollowedBy
part? From a high-level view it doesn't seem impossible because regex can also figure it out itself how to do a non-greedy match.
In this stack overflow issue: https://stackoverflow.com/questions/60222934/avoid-parsing-last-separator-with-sepby
Suppose we have
For
foo (string "a") (string " ") (string "b")
, we might expect"a a b"
to parse, but it does not, sincesepBy
eagerly descends into anothersep *> p
iteration when it sees the space character.This can be solved by adding a
try
tosepBy1
to make the noneager variants:in
Text.Megaparsec.Combinator
. Ideally with inlining and docs explaining the eager behavior of ordinarysepBy
. Happy to send a PR for this.