Open gelisam opened 4 years ago
Personally I would prefer if the error message was expecting a, 'b', or 'd'
, because the user has three ways to try to fix the problem: they can fix the input so that a well-formed a
begins at that character (which would require the user to understand why "aaa"
is not a valid a
even though it begins with an 'a'
), so that a valid sym "b"
begins at that character (which would require that character to be a 'b'
), or so that a valid sym "d"
begins at that character (which would require that character to be a 'd'
).
But since we wouldn't want e.g. a = try (sym "a" >> sym "z")
to cause the error message to incorrectly suggest replacing the first 'a'
with a 'z'
, I understand that it's probably easier not to include the suggestions which involve a try
which has consumed characters. Maybe sym "a"
sets a flag indicating that it has consumed a character, causing try
to clear the suggestions when it rolls back, but try
doesn't clear that flag, thus causing sym "b"
's suggestion to also get cleared?
The problem is caused by hint/error distinction and how error merge works.
(try (sym "a" >> empty) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> error ["b"] at 0) <|> pure ()
===> (error [] at 1) <|> pure () -- the errors are merged (the second is ignored because of its offset)
===> hint [] -- since @pure ()@ succeeds, it tries to promote the error to a hint, and ignores it because of its offset.
try (sym "a" >> empty) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (error ["b"] at 0 <|> pure ())
===> (error [] at 1) <|> hint ["b"] -- since @pure ()@ succeeds, the second error is promoted to a hint
===> hint ["b"]
This is a tricky one. Would be nice to have it fixed, but I'm not sure how.
How about keeping track of both the error message at the latest-consumed position and at the rightmost-observed position? Here is a long explanation of why I think it's a good idea, a prototype, and a proof that my prototype's implementation of (<|>)
is associative: https://gist.github.com/gelisam/bca49348986f119457837cdbfa746cd4
@gelisam Thank you! I'll get to reading this soon.
Sorry for the delay. I did not forget. Will try to look at this today.
I'm not in a hurry, take your time!
OK, I looked at this today. @gelisam I appreciate your write up and the propotype, but what we want here is one minimal change to the existing code base, a change that would not affect performance or anything else. I added your test to the test suite in the choice-associativity
branch.
Looking at the two cases it seems to me that the "right associative" one works as it should. "Left associative" should do better. More precisely, the problem seems to be that it discards the error at offset 0 completely while it potentially still can be useful later (when the error is converted to hints). I think it could be preserved somehow "just in case" and then restored. But well, it is just an idea. A ParseError
can have only one offset to begin with, so with current code base this path seems impossible. Let's face it too, it is nice to have simpler representation for parse errors. There doesn't seem to be another way to save that error at offset 0.
In the end, Parsec-like parser are stateful monsters. The current situation is the result of the compromises that we've made over time and in most cases the library works nicely. I'd welcome a PR which attempts to solve this in an non-intrusive way, but I myself will not allocate time for working on this issue because it is not clear how we could do better without a major re-write.
More precisely, the problem seems to be that it discards the error at offset 0 completely while it potentially still can be useful later (when the error is converted to hints).
Maybe we should convert errors with less offsets into hints (and store hints not only for ok
branches, but also for error
)? If I understood the problem correctly, it should be fixed:
(try (sym "a" >> empty) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> sym "b") <|> pure ()
===> ((error [] at 1) <|> error ["b"] at 0) <|> pure ()
===> (error [] at 1, hint ["b"]) <|> pure () -- the errors are merged (the second is converted to hint because of its offset)
===> hint ["b"]
try (sym "a" >> empty) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (sym "b" <|> pure ())
===> (error [] at 1) <|> (error ["b"] at 0 <|> pure ())
===> (error [] at 1) <|> hint ["b"] -- since @pure ()@ succeeds, the second error is promoted to a hint
===> hint ["b"]
Here is an example demonstrating that we get different error messages depending on how we parethesize
a <|> b <|> c
.