haskell / attoparsec

A fast Haskell library for parsing ByteStrings
http://hackage.haskell.org/package/attoparsec
Other
513 stars 93 forks source link

Parser is bad Alternative #122

Open cblp opened 8 years ago

cblp commented 8 years ago

Alternative is expected to be a monoid.

But for Parser we have:

import Prelude
import Data.Attoparsec.Text
import Control.Applicative

λ> parseOnly (fail "oops") "" == parseOnly (fail "oops" <|> empty) ""
False

λ> parseOnly (fail "oops") ""
Left "Failed reading: oops"
λ> parseOnly (fail "oops" <|> empty) ""
Left "Failed reading: empty"

Implications:

  1. Adding empty erases useful error message.
  2. User can't substitute p1 <|> p2 <|> p3 with asum [p1, p2, p3] (asum always appends empty at the end).
bgamari commented 8 years ago

Indeed this is true but I'm afraid I don't see any way to fix this which doesn't either compromise performance or significantly complicate the implementation (and perhaps, as a result, also compromise performance).

Fixing this would in effect require that we add a variety of "weak failure" result, which <|> could then drop in favor of the desired error. To do this efficiently we'd need to carry another closure through the parser state for signaling this mode of failure which I believe would rather significantly increase allocations. I would probably say that if you care enough about errors for the current Alternative behavior to be problematic, you should probably be using another parsing library.

cblp commented 8 years ago

I want to use Alternative freely with Aeson.Parser, and I have no option in aeson's underlying parsing library.

sol commented 8 years ago

One way to fix this is to have an Ord instance for your error type and then always return the "largest" error. Doing so can give you much-needed much better error messages for backtracking parsers.

(Basically what you do is order your errors from less specific to more specific, so that you always keep the more specific one.)

See https://github.com/vimus/vimus/blob/master/src/Vimus/Command/Parser.us for an implementation.

That said, attoparsec parsers are still a monoid under the assumption that we consider all errors equal (similar to what GHC does for exceptions).

Sent from mobile

On 1 Jul 2016, at 5:47 AM, Yuriy Syrovetskiy notifications@github.com wrote:

I want to use Alternative freely with Aeson.Parser, and I have no option in aeson's underlying parsing library.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

bgamari commented 8 years ago

@sol, the problem is that you end up carrying around significantly more information through the parser state. attoparsec is first-and-foremost designed with performance in mind; this is manifested in the fact that the Parser type itself is a CPS'd state monad, which allows us to produce straight-line, non-allocating code for nearly all parsers. Unfortunately I can't see a simple way to allow us to offer the "correct" Alternative behavior while preserving this property.

The simplest approach I could come up with (implemented here) appears to significantly degrade performance in uses of many and some, which I suspect is due to the book-keeping in tracking the additional errors (namely the allocation of a new failure closure at every branch point in the parse).

I don't have time to investigate this further but I suspect this will be a quite difficult issue to fix, if it is possible at all. That, of course, shouldn't stop others from taking a stab at it themselves. I'd be happy to see what others come up with.