rust-bakery / parser_benchmarks

Benchmarks for the nom, the Rust parser combinators library
123 stars 20 forks source link

Improve performance of haskell parser #1

Closed UnrealQuester closed 9 years ago

UnrealQuester commented 9 years ago

The parser using attoparsec is using choice to parse the box type. This basically reads 4 bytes and compares them against a string. If the comparison fails, it has to go back and read 4 bytes again for the next match. What can be done instead is first reading the box name and then checking all box types. Because we do not have to rely on backtracking we get noticeably better performance. Before:

benchmarking IO/small
time                 1.662 μs   (1.661 μs .. 1.664 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.663 μs   (1.662 μs .. 1.666 μs)
std dev              5.140 ns   (1.455 ns .. 11.26 ns)

benchmarking IO/big buck bunny
time                 1.616 μs   (1.616 μs .. 1.616 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.617 μs   (1.616 μs .. 1.620 μs)
std dev              4.874 ns   (593.1 ps .. 10.34 ns)

After:

benchmarking IO/small
time                 1.322 μs   (1.321 μs .. 1.324 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.323 μs   (1.321 μs .. 1.332 μs)
std dev              12.23 ns   (898.6 ps .. 28.09 ns)

benchmarking IO/big buck bunny
time                 1.276 μs   (1.275 μs .. 1.277 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.276 μs   (1.275 μs .. 1.276 μs)
std dev              1.423 ns   (1.082 ns .. 1.945 ns)

I am not sure if the hammer and nom versions do backtracking. It is not really needed in this case. If they do, then the comparison is fair and I apologize for wasting your time.

Hrothen commented 9 years ago

Turns out using a different parsing library gets the runtime down very close to nom. https://github.com/Codas/nom_benchmarks/tree/dev. Might be better to just swap that in entirely.

Geal commented 9 years ago

Thanks for your help! Multiple implementations are fine, it's good to compare everything. I guess at one point, this project will turn into "parser benchmarks" instead of just "nom benchmarks" :smile: