Open bristermitten opened 1 year ago
I wrote my own combinator
sepEndBy' :: Parser a -> Parser sep -> Parser [a]
sepEndBy' p sep = do
x <- try p
xs <- many (try (sep >> p))
pure (x : xs)
and it seems to work, but not sure if this is an optimal way
Can you show declaration
? I think the issue here is that when declaration
fails it completely backtracks (I don't know why, do you wrap the whole thing with a try
by chance?) What you want is to "commit" early enough to parsing a declaration, that is, you need your declaration
parser to consume input so that it cannot backtrack anymore. This way sepEndBy
won't be able to succeed in the way you are observing and you should get a more helpful and localized error message. Your custom combinator sepEndBy'
forces at least one occurrence of p
(in that it is similar to sepEndBy1
) and so it appears to solve the problem for the first declaration in your input, however you may find that the second and later declarations still suffer from the same problem.
That explanation makes sense, thanks for your response. How could I make it commit?
declaration
currently looks like this:
declaration :: ModuleName -> Parser (Declaration Frontend)
declaration = liftA2 (<|>) defDec letDec
defDec :: ModuleName -> Parser (Declaration Frontend)
defDec modName = do
symbol "def"
name <- NVarName <$> lexeme varName
symbol ":"
ty <- type'
let annotation = TypeAnnotation name ty
let declBody = ValueTypeDef (Just annotation)
pure (Declaration modName name declBody)
letDec :: ModuleName -> Parser (Declaration Frontend)
letDec modName = do
(name, patterns, e) <- L.nonIndented sc letRaw
let value = Value e patterns Nothing
pure (Declaration modName (NVarName name) value)
letRaw :: Parser (MaybeQualified VarName, [Frontend.Pattern], Frontend.Expr)
letRaw = do
((name, patterns), e) <- optionallyIndented letPreamble element
pure (name, patterns, e)
where
letPreamble = do
symbol "let"
name <- lexeme varName
patterns <- sepBy (lexeme pattern') sc
symbol "="
pure (name, patterns)
To allow the syntax def <name> : <type>
and let <name> = <body>
with some messy indentation handling ;)
Let me know if you need to see any more code. Thanks!
@knightzmc Check out the headed-megaparsec
package for some inspiration. The technique used there, which I quite like and often copy around to various projects, is to have a parser either return a value as normal (with the usual alternative semantics) or return another parser (which effectively "commits" to the current branch, and returns the parser to run on the remainder of the input).
However, I do wonder whether another package is even necessary. Could megaparsec
itself just expose a commit
/cut
combinator which throws away the failure continuations of a parser? (Or perhaps there's a different way of committing that's already present?)
@mitchellwrosen I had a look at that package, seems to solve my problem! But I agree that there should be an easier way of doing this which doesn't require external libraries (and a lot of boilerplate converting to and from Parsec
and HeadedParsec
everywhere)
From what I see in your code it should work as it is, but it is still incomplete, so I cannot be sure. @knightzmc Can you provide a repository with complete source code and an example input? I could then give it a try and it would perhaps be clearer what is going on there.
I'm having a similar issue.
I'd like to parse either a list of key-value pairs (separated by :
), or a single value. For instance:
a:x,b:y
should be parsed as list of two key-value pairs [("a","x"), ("b","y")]
.x
should be parsed as the single value "x"
.Here is a very minimal, simplified grammar. It doesn't support white spaces and both keys and values can only be identifiers. But it's enough to show the issue.
type KeyValue = (String, String)
data Val = KeyValList [KeyValue] | Val String deriving Show
ident :: Parser String
ident = some letterChar
keyVal :: Parser KeyValue
keyVal = do
k <- ident
":"
v <- ident
pure (k, v)
prog :: Parser Val
prog = try (KeyValList <$> keyVal `sepEndBy1` ",") <|> (Val <$> ident)
This thing handles correctly the two examples I showed above, and reports the correct error for invalid inputs like a:x,b:err or
or err or
.
However the error it reports for an input like a:x,b:y,c
or a:x,b:y,err or,c:z
is not what I would like to see:
Input: "a:x,b:y"
Output: KeyValList [("a","x"),("b","y")]
✅
Input: "x"
Output: Val "x"
✅
Input: "a:x,b:err or"
1:10:
|
1 | a:x,b:err or
| ^
unexpected space
expecting ',', end of input, or letter
✅
Input: "err or"
1:4:
|
1 | err or
| ^
unexpected space
expecting end of input or letter
✅
Input: "a:x,b:y,c"
1:2:
|
1 | a:x,b:y,c
| ^
unexpected ':'
expecting end of input or letter
❌
Input: "a:x,b:y,err or,c:z"
1:2:
|
1 | a:x,b:y,err or,c:z
| ^
unexpected ':'
expecting end of input or letter
❌
When the input is a:x,b:y,c
or a:x,b:y,err or,c:z
, I would like to see an error after the last ,
that parsed correctly.
Is there any way to get Megaparsec to report the error I wish to see?
@BlueNebulaDev with your definition of keyVal
it cannot backtrack, since once it consumes at least one letter of ident
it is already committed. Therefore somewhere after that final ,
it indeed fails, like you expect, but because you also have try
around (KeyValList <$> keyVal `sepEndBy1` ",")
that whole part backtracks to the very beginning of input. Next, ident
gets a chance to run and fails with the error that you observe. You need to remove try
from your definition of prog
and instead think when it makes sense for keyVal
to commit. It looks like as soon as there is :
we can be sure that it should be a key-value pair so:
keyVal :: Parser KeyValue
keyVal = do
k <- try (ident <* ":")
v <- ident
pure (k, v)
Try this and it should behave as you want in all four cases.
Still waiting a complete example from @knightzmc which should be fixable without resorting to any third-party libraries, but I'm happy to be proven wrong.
Thanks! After following your suggestion, the errors my parser is reporting are much better. I'm not sure I understand why a:x,b:err or
gave the expected error even without this change though. I'll try to play with dbg
to understand exactly what happens.
"a:x,b:err or"
resulted in a better error message because the sepEndBy1
combinator did not match the separator ","
after fully consuming the second key-value pair. This way sepEndBy1
as a whole succeeded and then you probably have eof
somewhere, which is actually what produces the error that you see.
Hey apologies for the delay. This code is pretty outdated now but if you're still happy to take a look I'd appreciate it!
This may be an XY problem, but I'm having an issue with
sepEndBy
not being greedy enough, and is hiding errors.For context, I'm parsing a haskell-like language with the following syntax
The code to do this is pretty simple:
Using
module' <* eof
as the "entrypoint" parser.The problem is, if
declaration
fails for some reason, the error thrown doesn't propagate up. Instead,sepEndBy
just returns an empty list, and the parser succeeds. This results in some very opaque error messages. For example, if I omitted the body for thelet
declaration:let main =
I'd expect to see an error saying something akin toUnexpected end of input, expecting expression
. Instead, themodule'
succeeds with an empty list of declarations, and theeof
causes an unintuitive and vague error message:(I'm not actually sure why it says this and not something like
Unexpected let ..., expecting eof
, but that's not the main problem here)I have
dbg
'd anddeclaration
throws the expected error, the problem is withsepEndBy
.My question is, is there an alternative to
sepEndBy
that will "greedily" parse the declarations? I still want to allow empty modules sosepEndBy1
won't do, but if alet
is seen it should keep parsing and fail if necessary, rather than backtracking and returning an empty list.Thanks for any help!