left recursion support - Githubissues

xli commented 13 years ago

Hi,

Based on OMeta's left recursion support algorithm for PEG, I added direct and indirect left recursion support into parslet.

The algorithm need only cache parse result of the Rule, not all kinds of Entities in parslet. So I changed Base to not call Context for caching parse result, instead, just call try directly. Then I added a new subclass of Parslet::Atoms::Entity called Rule so that we can have a specific try logic. Then I added Parslet::Atoms::Rule::Position for apply a rule at a position with Context (cache). The Position class implements the whole algorithm. Please see http://tinlizzie.org/ometa/ and http://www.vpri.org/pdf/tr2008003_experimenting.pdf for the details of the algorithm.

I added 2 specs into parser_spec.rb for proving the direct left recursion and in-direct left recursion works as expected.

meh commented 13 years ago

Thank you, I know some people that were waiting for this :)

stormbrew commented 13 years ago

Heh. Almost the exact same steps as my patch a few months ago, with some differences in implementation. It was rejected then on the premise that @kschiess wanted to use a processing step to turn left recursion into right recursion, or expect grammer writers to just manipulate right recursive rules' results into left-recursive trees.

FWIW, when I gave up on expecting PEG parsers to act like LALR parsers and started making my grammers right recursive, my life got a lot easier. Left recursion patch or no.

Incidentally, just not caching every parse action is actually a huge performance gain in my branch. For complex grammers memoizing actually seems to slow things down considerably, and there's a fairly large amount of literature on this out there. Most grammers don't actually backtrack very much, and the cache lookups can be fairly expensive (not to mention the massive heap growth that results from keeping every parse to date around even when most aren't ever used again -- garbage collection, especially in MRI, is not cheap).

meh commented 13 years ago

Well, we can always move left recursion to a parslet-lr gem or something :P

kschiess commented 13 years ago

Hey, this sounds nice. I will check it out.

kschiess commented 13 years ago

Ok, so here goes:

without your patch:

benchmarks/001-treetop.rb                 0.450000   0.040000   0.490000 (  0.489828)
benchmarks/002-http_parser.rb             2.100000   0.070000   2.170000 (  2.178904)
benchmarks/003-smalltalk.rb               7.310000   0.300000   7.610000 (  7.593408)

with the patch:

benchmarks/001-treetop.rb                 0.440000   0.040000   0.480000 (  0.474451)
benchmarks/002-http_parser.rb             2.550000   0.040000   2.590000 (  2.585775)
benchmarks/003-smalltalk.rb               9.100000   0.160000   9.260000 (  9.254780)

smalltalk is the most complex example, doing a lot of branches and tests for every char. Also, it is the closest to parsing programming languages.

We need to get closer to master before this patch can go in, regardless of all the other issues I have with LR-enabled PEG parsers. If possible, I'd like to shave off a few seconds from master, since treetop still beats it to pulp:

 size       parslet   treetop   
       245:   0.040     0.010   
      1203:   0.130     0.050   
      2281:   0.270     0.090   
      3218:   0.430     0.090   
      4131:   0.470     0.160   
      5109:   0.670     0.140   
      6099:   0.750     0.230   
      7105:   0.850     0.270   
      8062:   1.010     0.220   
      9073:   1.150     0.340   
     10091:   1.280     0.370

Also, while I kind of like your implementation (it is elegant and encapsulated), I still don't like the idea of LR-enabling parslet. It makes parsers behave in a counterintuitive manner IMHO.

(Results achieved using the parslet-benchmark project, bin/run and bin/compare respectively - this message also went to the mailing list, with some extra discussion. Please join me there.)

xli commented 13 years ago

Hi, thanks for look into it. I'll take a look the parslet-benchmark later.

deni64k commented 12 years ago

Any progress?

kschiess commented 12 years ago

The work seems to have stalled. I am not actively taking this up.

ghazel commented 12 years ago

+1 for picking this up again. I just bumped into the left-recursion issue and this resolved it.

ghazel commented 12 years ago

I've merged xli's fork with the current upstream master and updated it for modern error reporting. All tests pass: https://github.com/ghazel/parslet

ghazel commented 12 years ago

Speed difference is negligible:

(kschiess/parslet)

Using parslet at /Users/ghazel/projects/parslet_orig@545dbbc1281137c01c826d5afbcd3ca672cf0acb
benchmarks/001-treetop.rb                 0.410000   0.030000   0.440000 (  0.435223)
benchmarks/002-http_parser.rb             2.050000   0.040000   2.090000 (  2.090959)
benchmarks/003-smalltalk.rb               7.670000   0.160000   7.830000 (  7.833882)
benchmarks/004-csv.rb                    15.580000   0.170000  15.750000 ( 15.762639)

(ghazel/parslet)

ghazel@mba13:~/projects/parslet-benchmarks$ ./bin/run ../parslet/
Using parslet at /Users/ghazel/projects/parslet@4fd171f66bf315f61fc4bb8b60dae4b1e8fc7ae3
benchmarks/001-treetop.rb                 0.410000   0.030000   0.440000 (  0.432869)
benchmarks/002-http_parser.rb             2.080000   0.040000   2.120000 (  2.124302)
benchmarks/003-smalltalk.rb               7.730000   0.170000   7.900000 (  7.897277)
benchmarks/004-csv.rb                    15.590000   0.160000  15.750000 ( 15.762415)

kschiess commented 12 years ago

Using parslet at ghazel/parslet
benchmarks/001-treetop.rb                 0.640000   0.070000   0.710000 (  0.702863)
benchmarks/002-http_parser.rb             2.270000   0.060000   2.330000 (  2.354683)
benchmarks/003-smalltalk.rb               8.470000   0.310000   8.780000 (  9.845145)
benchmarks/004-csv.rb                    16.870000   0.210000  17.080000 ( 17.201357)

Using parslet at kschiess/parslet
benchmarks/001-treetop.rb                 0.600000   0.060000   0.660000 (  0.662027)
benchmarks/002-http_parser.rb             1.450000   0.050000   1.500000 (  1.527339)
benchmarks/003-smalltalk.rb               5.230000   0.240000   5.470000 (  5.529103)
benchmarks/004-csv.rb                    12.270000   0.160000  12.430000 ( 12.541256)

Can't reproduce these positive results. What Ruby version are you using?

ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]

ghazel commented 12 years ago

ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12.0.0]

on a Macbook Air 13", OS X 10.8.1

ghazel commented 12 years ago

Ah, it had to do with how I was calling the runner:

~/projects/parslet-benchmarks$ ./bin/run ../parslet_orig
Using parslet at /Users/ghazel/projects/parslet_orig@545dbbc1281137c01c826d5afbcd3ca672cf0acb

That is actually a lie -- the path it added does not contain parslet.rb (it should be ../parslet_orig/lib), so it was loading the installed gem instead, in both cases. New numbers match yours, more or less:

ghazel/parslet
~/projects/parslet-benchmarks$ ./bin/run ../parslet/lib
Using parslet at /Users/ghazel/projects/parslet/lib@fad37f129e38e4cb690e6493cef4a8503e92930d
benchmarks/001-treetop.rb                 0.400000   0.030000   0.430000 (  0.421230)
benchmarks/002-http_parser.rb             2.090000   0.050000   2.140000 (  2.133370)
benchmarks/003-smalltalk.rb               8.060000   0.180000   8.240000 (  8.248263)
benchmarks/004-csv.rb                    15.860000   0.160000  16.020000 ( 16.041821)

kschiess/parslet
~/projects/parslet-benchmarks$ ./bin/run ../parslet_orig/lib
Using parslet at /Users/ghazel/projects/parslet_orig/lib@545dbbc1281137c01c826d5afbcd3ca672cf0acb
benchmarks/001-treetop.rb                 0.360000   0.030000   0.390000 (  0.394083)
benchmarks/002-http_parser.rb             1.270000   0.030000   1.300000 (  1.308231)
benchmarks/003-smalltalk.rb               4.720000   0.130000   4.850000 (  4.858223)
benchmarks/004-csv.rb                    11.940000   0.100000  12.040000 ( 12.076694)

kschiess commented 12 years ago

Thanks for following up. I would not be offended at all if someone would fork parslet and add left recursion. I would provide support for that endeavor. However, my main goal remains to hunt for speed, and left recursion is counter-productive in that respect. Also, it solves a problem I've never had with parslet (probably because I am the author) - the odds are therefore against a merge.

ghazel commented 12 years ago

For me, functionality is more important than speed. That said, how would you refactor left-recursion in this case? https://gist.github.com/0b0b302c2c1271b85b8b

kschiess commented 12 years ago

Your case is not hard - to break recursion, a 'word' must occur before any parenthesis. I'd swap priorities around.

kschiess commented 12 years ago

Please write to the list for getting this kind of help.

ghazel commented 12 years ago

I did, a week ago, without resolution: http://librelist.com/browser//ruby.parslet/2012/8/27/recursion-without-consuming-a-character/

It was easier for me to find this pull request, fork xli's work, update it, and use left recursion support.

kschiess / parslet

left recursion support #51