A partially-matching operator breaks makeExprParser

BlueNebulaDev commented 1 year ago

I'm trying to use makeExprParser with Megaparsec. I found its behavior surprising when an operator matches the string partially.

Take this example:

parseTerm = dbg "term" $ some alphaNumChar

op1 = space *> "+" *> space
op2 = "+"
op3 = "." *> "+" *> space

parseExpr = dbg "expr" $ makeExprParser parseTerm [
        [ InfixL ((++) <$ dbg "plus" op1) ]
        ]

parseMain str = parse (space *> parseExpr <* space <* (optional ".") <* eof) "" $ fromString str

When you use op1, the string "o + a" parses correctly, but the string "o + a " doesn't.
When you use op2, both the strings "o+a" and "o+a " work correctly.
When you use op3, "o.+ a ." works, but "o.+ a." doesn't.

I believe that what's happening in the cases that fail, is that the operator's first sub-combinator matches, and somehow that breaks everything in case the operator doesn't fully match.

mrkkrp commented 1 year ago

An operator needs to be able to backtrack. This is more completely described in the tutorial, but long story short, both operators and terms need to be able to backtrack. If x in x *> space backtracks correctly (that is, either x completely matches, or it fails and doesn't consume anything at all), then x *> space will backtrack correctly, too. This is because space pretty much always matches, since it can match on 0 space characters, too. So, the approach that you will find in the tutorial and in the docs of the lexer module is to parse whitespace after every token (not before!):

lexeme = Text.Megaparsec.Char.Lexer.lexeme space
symbol = Text.Megaparsec.Char.Lexer.symbol space

parseTerm = dbg "term" $ lexeme (some alphaNumChar)

op = symbol "+"

parseExpr = dbg "expr" $ makeExprParser parseTerm [
        [ InfixL ((++) <$ op) ]
        ]

parseMain str = parse (space *> parseExpr <* (optional ".") <* eof) "" $ fromString str

Bear in mind that I have not tested this, this is just a hint for you to find the right approach. This way things like this:

"o + a"
"o + a "
"o+a"
"o+a "

should all match. I do not know what it is that you want to achieve with .. The way it is written you want it to be separate from your expression and kind of optionally follow it, but your examples suggest that you want it to be part of the term.

In either case I really recommend going through the tutorial, it probably would save you a lot of time.

BlueNebulaDev commented 1 year ago

I see. Thanks for addressing this question so quickly and for clarifying. I'm sorry I didn't pay more attention to the tutorial: I assumed backtracking was automatic.

Consuming all spaces after my terms would have worked, but the the reason I couldn't do that is that I wish to treat newlines as separators (e.g. within lists or records), so I can't blindly consume them after every term. Wrapping my operators within a try did the trick. I hope this won't bite me later; I'll read the whole tutorial with care to understand better how things work.

Thanks for developing this package and for the great support!

mrkkrp commented 1 year ago

I wish to treat newlines as separators

In that case you might want to use hspace instead of space.

BlueNebulaDev commented 1 year ago

That's what I'm doing. However it's not enough to use hspace to consume all spaces after every terms, if the operators don't backtrack: parsing would fail in case of multiple empty lines or similar situations.

mrkkrp / parser-combinators

A partially-matching operator breaks makeExprParser #47