haskell / aeson

A fast Haskell JSON library
Other
1.25k stars 317 forks source link

Rewrite aeson with binary-parsers #466

Closed winterland1989 closed 1 year ago

winterland1989 commented 7 years ago

EDIT: For people looking for benchmarks, i've done this experiment in #467. Please review.

Recently i release a parsing library based on binary, it's incredible fast which made me re-read binary's source code, which lead to several further optimizations later.

From the benchmark, performance are improved by 10~30% with new parser, and i would expect more improvement on aeson's master, since binary-parsers provide a new scan combinator which i haven't used in benchmark yet:

-- | Similar to 'scan', but working on 'ByteString' chunks, The predicate
-- consumes a 'ByteString' chunk and transforms a state argument,
-- and each transformed state is passed to successive invocations of
-- the predicate on each chunk of the input until one chunk got splited to
-- @Right (ByteString, ByteString)@ or the input ends.
--
scanChunks :: s -> Consume s -> Get ByteString

My plan is to using this combinator with a C version scan function to find string's double quote end.

But this whole thing will break compatibility badly since aeson expose attoparsec parsers through Data.Aeson.Parser module, so i'm asking you to make a judgement. Should i do this in aeson, or should i just roll a new package? I definitely don't want to split a package just for a new parser.

hvr commented 7 years ago

Would the new parser be as easily consumable in a streaming mode like attoparsec ones are (c.f. http://hackage.haskell.org/package/io-streams-1.3.5.0/docs/System-IO-Streams-Attoparsec.html)?

winterland1989 commented 7 years ago

Yes, definitely. Binary support streaming perfectly with Decoder just like attoparsec's IResult. I'll going to update my wire-streams package to support binary only now, becasue an obvious performance issue of binary has been addressed in binary-parsers.

Yuras commented 7 years ago

@winterland1989 it attoparsec provides something like takeChunk :: Parse ByteString (which returns the unconsumed input without asking for more input), then will it be possible to build attoparsec parser from binary-parser one? In that case aeson can export it for compatibility.

winterland1989 commented 7 years ago

Good idea, it occurred to me maybe takeLazyByteString can provide a way to make this isomorphism, we just ask atto to return remaining input as lazy bytestring and feed it into a binary parser. Then all we have to do is providing a way to inject final Decoder into a Parser. But:

  1. I'm not sure it's doable.
  2. I'm not sure if it affect the performance.
mgsloan commented 7 years ago

@winterland1989 Have you seen the store library? It is highly optimized. I would be very surprised to see any circumstances where binary-parsers is faster.

winterland1989 commented 7 years ago

Hi @mgsloan, I checked store just after its release : ) but i don't think store is up to the task here, for example how can i provide these combinators with store?

IMHO, store is trying to make a better Storable story rather than deal with custom protocols, so maybe it's better to compare it to derive-storable or upcoming compact region?

mgsloan commented 7 years ago

You could certainly provide those combinators by constructing a monad atop the Poke and Peek monads. No need to re-invent the full stack of binary serialization. I have plans to do so, but haven't gotten around to it quite yet. There is lots more to build atop the store abstractions that are not yet built. But feel free to do your own thing, of course. I am just puzzled why you would propose that a venerable package like aeson adopt your new package

winterland1989 commented 7 years ago

Because i care for the performance as much as you do ; )

Whether or not to adopt my package is the whole point of this discussion, if you think it's a wrong thing to do, i'd like to hear more about what makes you think so.

That been said, i don't think Peek is capable enough to provide tools to deal with parsing:

  1. Peek is built around IO which is not suitable in pure parsing context.
  2. Peek doesn't deal with partial input, which makes streaming parsing hard.
winterland1989 commented 7 years ago

For people looking for benchmarks, i've done this experiment in #467. Please review.

tolysz commented 7 years ago

I just hope all new dependencies will build with ghcjs

winterland1989 commented 7 years ago

@tolysz: we will pull bytestring-lexing by pullingbinary-parsers, everything else is already there. Is bytestring-lexing buildable under ghcjs?

tolysz commented 7 years ago

bytestring-lexing builds.

winterland1989 commented 7 years ago

Then this should build.

mgsloan commented 7 years ago

Cool, seems reasonable, @winterland1989 ! I think Peek being built around IO is fine. That's what it takes to do raw memory access. We can safely wrap in an unsafePerformIO. It is better to have 1 unsafePerformIO than multiple, because IIRC there is some cost at runtime.

winterland1989 commented 7 years ago

Well, i actually have some questions about safety of the unsafePerformIO in store's encode/decode, if you don't mind: The Peek is MonadIO is not restrict enough to prevent people misuse it, e.g. if you build a combinator library on top of it, then there should be a way to ensure user isn't doing something too crazy, because suddenly all print/readFile.. became available.

mgsloan commented 7 years ago

Good point! I've opened https://github.com/fpco/store/issues/76 about it. You only can do that if you import Data.Store.Core, which should be unusual for most users of Store - you don't usually directly use the Peek / Poke primitives.

bergmark commented 7 years ago

Thank you @winterland1989!

Adding new dependencies is something we have always been careful about for several reasons such as stability of the libs apis, bus number, introduction of another possible point of failure... So please excuse my hesitation as my gut reaction is "maybe later".

But I don't think I should make this decision by myself, I'd very much like to hear what others think about all of this.

Cheers

winterland1989 commented 7 years ago

Yes, I fully understand this, that's why I am asking everyone here, please take your time to evaluate this change.

phadej commented 7 years ago

FWIW, with Backpack we could have implementation against both, attoparsec and binary-parsers. Judging from the pull request https://github.com/bos/aeson/pull/467, the needed interface is quite small.

cc @ezyang

ezyang commented 7 years ago

Yes, Backpack would help. Just not immediately (since it's not released yet.)

phadej commented 1 year ago

The Data.Aeson.Decoding supersedes this. It's definitely over 10-30% faster than attoparsec, so i'd say the goal is achieved.