kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

Performance improvements #185

Closed olbrich closed 6 years ago

olbrich commented 6 years ago

A couple of optimizations that seem to pretty dramatically improve performance. One of the problems I encountered with a fairly complex parser was that it was spending a lot of time formatting error messages in Atom constructors that were never used during successful parsing. Because the error messages can recursively include string representations of other Atoms, this got expensive fast. By moving them out into their own methods, we can delay evaluation until they are needed and also memoize the result so we only do it once.

olbrich commented 6 years ago

any feedback?

olbrich commented 6 years ago

@kschiess can you take a look at this?

kschiess commented 6 years ago

I can and I will, just on my own time scale. This is open source.

Just a heads up: Your diff contains lots of changes that are unrelated to what you describe. This makes it a) harder for me to take a quick look and it will b) mess up our git history. I like focused PRs.

olbrich commented 6 years ago

@kschiess Much appreciated. Just wanted to make sure you were aware of the PR. Looks like I got a bit carried away with rubocop / trailing whitespace autocorrection. I'll update the PR with a tighter change set shortly.

olbrich commented 6 years ago

@kschiess I've cleaned up this PR to keep it to just the essential changes.

For reference: on master...

time ruby example/optimized_erb.rb
...
real    0m0.241s
user    0m0.101s
sys 0m0.054s

on this branch...

time ruby example/optimized_erb.rb
...
real    0m0.117s
user    0m0.081s
sys 0m0.032s
kschiess commented 6 years ago

Hei @olbrich,

Thanks for taking the time to clean this up. I think I understand now.

I've measured the difference between your performance tweaks and the master of parslet as per today using my own set of benchmarks. Here are the results:

@master

Using parslet at /Users/kschiess/git/own/parslet/lib@7ac42d9bbd92345e3decf891256097debe51720b
benchmarks/001-treetop.rb                 0.152259   0.015071   0.167330 (  0.170218)
benchmarks/002-http_parser.rb             0.692536   0.045098   0.737634 (  0.739403)
benchmarks/003-smalltalk.rb               2.525067   0.185219   2.710286 (  2.710744)
benchmarks/004-csv.rb                     3.396876   2.547524   5.944400 (  5.945829)
benchmarks/005-minip.rb                   0.936701   0.004827   0.941528 (  0.941517)

@PR #185

Using parslet at /Users/kschiess/git/own/parslet/lib@c0d3d8717c3a49bdae9b83b8239c88f60e843ea6
benchmarks/001-treetop.rb                 0.128832   0.015148   0.143980 (  0.145210)
benchmarks/002-http_parser.rb             0.723474   0.048377   0.771851 (  0.775632)
benchmarks/003-smalltalk.rb               2.593355   0.172649   2.766004 (  2.766053)
benchmarks/004-csv.rb                     3.528174   2.572153   6.100327 (  6.100973)
benchmarks/005-minip.rb                   1.008138   0.005431   1.013569 (  1.013840)

All of this using Ruby 2.5.0 on a fairly recent Mac Pro.

The 'optimized_erb' sample yields these:

master:         0.080000   0.000000   0.080000 (  0.076035)
your branch:    0.070000   0.000000   0.070000 (  0.074347)

All in all I don't get a clear impression from this. My performance timers are unprecise, certainly, but the benchmark suite does many repetitions to get bigger numbers.

What do you think?

olbrich commented 6 years ago

@kschiess Thanks for taking a look. I think the particular cases that will benefit the most from these improvements are ones where you can get deeply nested parsers.

This all started when I began to rewrite # olbrich/ruby-units to use parslet to parse unit expressions. On my machine (using ruby 2.5.0) the ruby-units refactor-parser branch takes ~15.8 seconds to run through a test suite of 963 examples. With this PR, the same tests take about ~3 seconds. My guess is that there is something that I'm doing in my parser that isn't covered by your benchmarks.

In any event, the parser I'm trying to optimize is here.. if you want to take a look. It's fairly complex, and perhaps there are some things I can do to make it inherently more efficient. Feedback on that is welcome.

kschiess commented 6 years ago

Hi again,

I've taken the liberty of adding your parser to the benchmarks. This is what I get:

Current master:

Using parslet at /Users/kschiess/git/own/parslet/lib@7ac42d9bbd92345e3decf891256097debe51720b
benchmarks/001-treetop.rb                 0.160000   0.020000   0.180000 (  0.176386)
benchmarks/002-http_parser.rb             0.730000   0.050000   0.780000 (  0.781952)
benchmarks/003-smalltalk.rb               2.650000   0.170000   2.820000 (  2.816006)
benchmarks/004-csv.rb                     3.540000   2.050000   5.590000 (  5.587017)
benchmarks/005-minip.rb                   1.010000   0.000000   1.010000 (  1.018649)
benchmarks/006-ruby_units.rb              4.930000   0.430000   5.360000 (  5.364590)

Your PR:

Using parslet at /Users/kschiess/git/own/parslet/lib@c0d3d8717c3a49bdae9b83b8239c88f60e843ea6
benchmarks/001-treetop.rb                 0.130000   0.010000   0.140000 (  0.139433)
benchmarks/002-http_parser.rb             0.700000   0.040000   0.740000 (  0.755331)
benchmarks/003-smalltalk.rb               2.660000   0.160000   2.820000 (  2.824107)
benchmarks/004-csv.rb                     3.480000   2.080000   5.560000 (  5.565576)
benchmarks/005-minip.rb                   1.030000   0.000000   1.030000 (  1.040488)
benchmarks/006-ruby_units.rb              4.730000   0.460000   5.190000 (  5.184958)

This is more for completeness sake; if the changes are important to you, I'll merge them, since they don't have a negative effect. Thank you for this contribution. I enjoyed this.

kaspar

kschiess commented 6 years ago

I've released this as '1.8.2'. Enjoy!