Open cblp opened 8 years ago
Indeed this is true but I'm afraid I don't see any way to fix this which doesn't either compromise performance or significantly complicate the implementation (and perhaps, as a result, also compromise performance).
Fixing this would in effect require that we add a variety of "weak failure" result, which <|>
could then drop in favor of the desired error. To do this efficiently we'd need to carry another closure through the parser state for signaling this mode of failure which I believe would rather significantly increase allocations. I would probably say that if you care enough about errors for the current Alternative
behavior to be problematic, you should probably be using another parsing library.
I want to use Alternative freely with Aeson.Parser, and I have no option in aeson's underlying parsing library.
One way to fix this is to have an Ord instance for your error type and then always return the "largest" error. Doing so can give you much-needed much better error messages for backtracking parsers.
(Basically what you do is order your errors from less specific to more specific, so that you always keep the more specific one.)
See https://github.com/vimus/vimus/blob/master/src/Vimus/Command/Parser.us for an implementation.
That said, attoparsec parsers are still a monoid under the assumption that we consider all errors equal (similar to what GHC does for exceptions).
Sent from mobile
On 1 Jul 2016, at 5:47 AM, Yuriy Syrovetskiy notifications@github.com wrote:
I want to use Alternative freely with Aeson.Parser, and I have no option in aeson's underlying parsing library.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@sol, the problem is that you end up carrying around significantly more information through the parser state. attoparsec
is first-and-foremost designed with performance in mind; this is manifested in the fact that the Parser
type itself is a CPS'd state monad, which allows us to produce straight-line, non-allocating code for nearly all parsers. Unfortunately I can't see a simple way to allow us to offer the "correct" Alternative
behavior while preserving this property.
The simplest approach I could come up with (implemented here) appears to significantly degrade performance in uses of many
and some
, which I suspect is due to the book-keeping in tracking the additional errors (namely the allocation of a new failure closure at every branch point in the parse).
I don't have time to investigate this further but I suspect this will be a quite difficult issue to fix, if it is possible at all. That, of course, shouldn't stop others from taking a stab at it themselves. I'd be happy to see what others come up with.
Alternative
is expected to be a monoid.But for
Parser
we have:Implications:
empty
erases useful error message.p1 <|> p2 <|> p3
withasum [p1, p2, p3]
(asum
always appendsempty
at the end).