wrengr / bytestring-lexing

Efficiently parse and produce common lexemes: namely integral and fractional numbers.
Other
6 stars 4 forks source link

bytestring-lexing

Hackage version Build Status Dependencies

The bytestring-lexing package offers extremely efficient ByteString parsers for some common lexemes: namely integral and fractional numbers. In addition, it provides efficient serializers for (some of) the formats it parses.

As of version 0.3.0, bytestring-lexing offers the best-in-show parsers for integral values. And as of version 0.5.0 it offers (to my knowledge) the best-in-show parser for fractional/floating numbers. A record of these benchmarks can be found here

Install

This is a simple package and should be easy to install. You should be able to use the standard:

$> cabal install bytestring-lexing

Testing

To run the test suite (without coverage information), you can use the standard method (with runhaskell Setup.hs in lieu of cabal, if necessary):

$> cd bytestring-lexing
$> cabal configure --enable-tests
$> cabal build
$> cabal test

If you want coverage information as well, there are a few options depending on your version of Cabal. For modern cabal with v2/nix-style builds, add --enable-coverage to the configure step, and the results will be located at ./dist-newstyle/build/$ARCH/$GHC/bytestring-lexing-$VERSION/opt/hpc/vanilla/html/bytestring-lexing-$VERSION/hpc_index.html. For v1/classic builds, add --enable-coverage to the configure step and also add --keep-tix-files to the test step, and the results are instead located at ./dist/hpc/vanilla/html/bytestring-lexing-$VERSION/hpc_index.html. For very old versions of Cabal, you must use --enable-library-coverage in lieu of --enable-coverage.

Benchmarks

If you want to run the benchmarking code, then do:

$> cd bytestring-lexing/bench
$> cabal configure
$> cabal build
$> for b in isSpace numDigits packDecimal readDecimal readExponential ceilEightThirds; do
       ./dist/build/bench-${b}/bench-${b} -o ${b}.html;
   done && open *.html

Of course, you needn't run all the benchmarking programs if you don't want. Notably, these benchmarks are artefacts of the development of the library. They are not necessarily the most up-to-date reflection of the library itself, nor of other Haskell libraries we've compared against in the past.

Portability

An attempt has been made to keep this library portable. However, we do make use of two simple language extensions. Both of these would be easy enough to remove, but they should not pose a significant portability burden. If they do in fact pose a burden for your compiler, contact the maintainer.

[ghc-bangpatterns]: https://downloads.haskell.org/~ghc/6.6.1/docs/html/users_guide/sec-bang-patterns.html [jhc-bangpatterns]: http://repetae.net/computer/jhc/manual.html#code-options [uhc-bangpatterns]: https://github.com/UU-ComputerScience/uhc-js/issues/1 [hugs-bangpatterns]: https://mail.haskell.org/pipermail/haskell-cafe/2010-July/079946.html

Changes: Version 0.5.0 (2015-05-06) vs 0.4.3 (2013-03-21)

I've completely overhauled the parsers for fractional numbers.

The old Data.ByteString.Lex.Double and Data.ByteString.Lex.Lazy.Double modules have been removed, as has their reliance on Alex as a build tool. I know some users were reluctant to use bytestring-lexing because of that dependency, and forked their own version of bytestring-lexing-0.3.0's integral parsers. This is no longer an issue, and those users are requested to switch over to using bytestring-lexing.

The old modules are replaced by the new Data.ByteString.Lex.Fractional module. This module provides two variants of the primary parsers. The readDecimal and readExponential functions are very simple and should suffice for most users' needs. The readDecimalLimited and readExponentialLimited are variants which take an argument specifying the desired precision limit (in decimal digits). With care, the limited-precision parsers can perform far more efficiently than the unlimited-precision parsers. Performance aside, they can also be used to intentionally restrict the precision of your program's inputs.

Benchmarks: Version 0.5.0 (2015-05-06)

The Criterion output of the benchmark discussed below, is available here. The main competitors we compare against are the previous version of bytestring-lexing (which already surpassed text and attoparsec/scientific) and bytestring-read which was the previous best-in-show.

The unlimited-precision parsers provide 3.3x to 3.9x speedup over the readDouble function from bytestring-lexing-0.4.3.3, as well as being polymorphic over all Fractional values. For Float/Double: these functions have essentially the same performance as bytestring-read on reasonable inputs (1.07x to 0.89x), but for inputs which have far more precision than Float/Double can handle these functions are much slower than bytestring-read (0.30x 'speedup'). However, for Rational: these functions provide 1.26x to 1.96x speedup compared to bytestring-read.

The limited-precision parsers do even better, but require some care to use properly. For types with infinite precision (e.g., Rational) we can pass in an 'infinite' limit by passing the length of the input string plus one. For Rational: doing so provides 1.5x speedup over the unlimited-precision parsers (and 1.9x to 3x speedup over bytestring-read), because we can avoid intermediate renormalizations. Whether other unlimited precision types would see the same benefit remains an open question.

For types with inherently limited precision (e.g., Float/Double), we could either pass in an 'infinite' limit or we could pass in the actual inherent limit. For types with inherently limited precision, passing in an 'infinite' limit degrades performance compared to the unlimited-precision parsers (0.51x to 0.8x 'speedup'). Whereas, passing in the actual inherent limit gives 1.3x to 4.5x speedup over the unlimited-precision parsers. They also provide 1.2x to 1.4x speedup over bytestring-read; for a total of 5.1x to 14.4x speedup over bytestring-lexing-0.4.3.3!

Links