Add a non-backtracking haskell parser (cereal) to the tested libraries

Codas commented 9 years ago

This pull request adds a benchmark for a haskell library specialized on parsing binary data without backtracking. Results should be roughly in line with those of the nom library for rust.

Attoparsec results:

time                 2.003 μs   (1.989 μs .. 2.020 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 2.002 μs   (1.989 μs .. 2.019 μs)
std dev              46.87 ns   (36.95 ns .. 62.51 ns)
variance introduced by outliers: 28% (moderately inflated)

benchmarking IO/big buck bunny
time                 1.980 μs   (1.963 μs .. 1.999 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 2.006 μs   (1.983 μs .. 2.042 μs)
std dev              97.36 ns   (66.41 ns .. 134.1 ns)
variance introduced by outliers: 64% (severely inflated)

Cereal results:

time                 248.6 ns   (245.1 ns .. 252.0 ns)
                     0.998 R²   (0.998 R² .. 0.999 R²)
mean                 244.6 ns   (241.6 ns .. 247.4 ns)
std dev              9.913 ns   (8.387 ns .. 11.50 ns)
variance introduced by outliers: 59% (severely inflated)

benchmarking IO/big buck bunny
time                 248.5 ns   (246.1 ns .. 251.5 ns)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 248.6 ns   (245.3 ns .. 251.7 ns)
std dev              10.53 ns   (9.152 ns .. 12.56 ns)
variance introduced by outliers: 61% (severely inflated)

Both were a bit faster at the time I updated the README, but the impovements should be clearly visible.

Geal commented 9 years ago

This looks really cool, thank you for your work!

A few nitpickings:

I will verify the benchmarks on my own computer and update the readme, to keep a consistent environment.
you should probably mark yourself as author for that parser :)

Geal commented 9 years ago

So, after running the benchmark on my machine, it appears that cereal is faster than nom! I have to get back to work and fix that :smile:

Codas commented 9 years ago

Thanks for the quick merge!

It's important to realize that GHC as well as the cereal library matured over many years. Its very impressive what rust and nom can do already. Also, nom seems to handle large files much better. I have not benchmarked it, but I woule expect nom to win for any file larger than maybe 10 MB.

Geal commented 9 years ago

As I mentioned in the readme, my goal with those benchmarks is to see in which performance range nom should be, not to be the fastest (usability is more important and my focus right now).

I looked a lot at attoparsec's source to get design ideas, so I'll probably look at how cereal handles it. If it is not applicable for nom, that could still make it in another library.

To handle large files, the way the parser benchmark is written is not good enough, since all of the file has to be mapped in memory (it is a good way to remove timing differences due to syscals, though). The other version of that mp4 parser in the nom repository supports seeking, and that makes it a lot faster and less memory hungry.

The interesting thing with nom's memory usage is that since it uses slices (a structure containing a pointer and a length) everywhere, and only parses the ftyp box, unneeded data is not even loaded in the process.

rust-bakery / parser_benchmarks

Add a non-backtracking haskell parser (cereal) to the tested libraries #3