Closed xli closed 12 years ago
Thank you, I know some people that were waiting for this :)
Heh. Almost the exact same steps as my patch a few months ago, with some differences in implementation. It was rejected then on the premise that @kschiess wanted to use a processing step to turn left recursion into right recursion, or expect grammer writers to just manipulate right recursive rules' results into left-recursive trees.
FWIW, when I gave up on expecting PEG parsers to act like LALR parsers and started making my grammers right recursive, my life got a lot easier. Left recursion patch or no.
Incidentally, just not caching every parse action is actually a huge performance gain in my branch. For complex grammers memoizing actually seems to slow things down considerably, and there's a fairly large amount of literature on this out there. Most grammers don't actually backtrack very much, and the cache lookups can be fairly expensive (not to mention the massive heap growth that results from keeping every parse to date around even when most aren't ever used again -- garbage collection, especially in MRI, is not cheap).
Well, we can always move left recursion to a parslet-lr gem or something :P
Hey, this sounds nice. I will check it out.
Ok, so here goes:
without your patch:
benchmarks/001-treetop.rb 0.450000 0.040000 0.490000 ( 0.489828)
benchmarks/002-http_parser.rb 2.100000 0.070000 2.170000 ( 2.178904)
benchmarks/003-smalltalk.rb 7.310000 0.300000 7.610000 ( 7.593408)
with the patch:
benchmarks/001-treetop.rb 0.440000 0.040000 0.480000 ( 0.474451)
benchmarks/002-http_parser.rb 2.550000 0.040000 2.590000 ( 2.585775)
benchmarks/003-smalltalk.rb 9.100000 0.160000 9.260000 ( 9.254780)
smalltalk is the most complex example, doing a lot of branches and tests for every char. Also, it is the closest to parsing programming languages.
We need to get closer to master before this patch can go in, regardless of all the other issues I have with LR-enabled PEG parsers. If possible, I'd like to shave off a few seconds from master, since treetop still beats it to pulp:
size parslet treetop
245: 0.040 0.010
1203: 0.130 0.050
2281: 0.270 0.090
3218: 0.430 0.090
4131: 0.470 0.160
5109: 0.670 0.140
6099: 0.750 0.230
7105: 0.850 0.270
8062: 1.010 0.220
9073: 1.150 0.340
10091: 1.280 0.370
Also, while I kind of like your implementation (it is elegant and encapsulated), I still don't like the idea of LR-enabling parslet. It makes parsers behave in a counterintuitive manner IMHO.
(Results achieved using the parslet-benchmark project, bin/run and bin/compare respectively - this message also went to the mailing list, with some extra discussion. Please join me there.)
Hi, thanks for look into it. I'll take a look the parslet-benchmark later.
Any progress?
The work seems to have stalled. I am not actively taking this up.
+1 for picking this up again. I just bumped into the left-recursion issue and this resolved it.
I've merged xli's fork with the current upstream master and updated it for modern error reporting. All tests pass: https://github.com/ghazel/parslet
Speed difference is negligible:
(kschiess/parslet)
Using parslet at /Users/ghazel/projects/parslet_orig@545dbbc1281137c01c826d5afbcd3ca672cf0acb
benchmarks/001-treetop.rb 0.410000 0.030000 0.440000 ( 0.435223)
benchmarks/002-http_parser.rb 2.050000 0.040000 2.090000 ( 2.090959)
benchmarks/003-smalltalk.rb 7.670000 0.160000 7.830000 ( 7.833882)
benchmarks/004-csv.rb 15.580000 0.170000 15.750000 ( 15.762639)
(ghazel/parslet)
ghazel@mba13:~/projects/parslet-benchmarks$ ./bin/run ../parslet/
Using parslet at /Users/ghazel/projects/parslet@4fd171f66bf315f61fc4bb8b60dae4b1e8fc7ae3
benchmarks/001-treetop.rb 0.410000 0.030000 0.440000 ( 0.432869)
benchmarks/002-http_parser.rb 2.080000 0.040000 2.120000 ( 2.124302)
benchmarks/003-smalltalk.rb 7.730000 0.170000 7.900000 ( 7.897277)
benchmarks/004-csv.rb 15.590000 0.160000 15.750000 ( 15.762415)
Using parslet at ghazel/parslet
benchmarks/001-treetop.rb 0.640000 0.070000 0.710000 ( 0.702863)
benchmarks/002-http_parser.rb 2.270000 0.060000 2.330000 ( 2.354683)
benchmarks/003-smalltalk.rb 8.470000 0.310000 8.780000 ( 9.845145)
benchmarks/004-csv.rb 16.870000 0.210000 17.080000 ( 17.201357)
Using parslet at kschiess/parslet
benchmarks/001-treetop.rb 0.600000 0.060000 0.660000 ( 0.662027)
benchmarks/002-http_parser.rb 1.450000 0.050000 1.500000 ( 1.527339)
benchmarks/003-smalltalk.rb 5.230000 0.240000 5.470000 ( 5.529103)
benchmarks/004-csv.rb 12.270000 0.160000 12.430000 ( 12.541256)
Can't reproduce these positive results. What Ruby version are you using?
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.4.0]
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12.0.0]
on a Macbook Air 13", OS X 10.8.1
Ah, it had to do with how I was calling the runner:
~/projects/parslet-benchmarks$ ./bin/run ../parslet_orig
Using parslet at /Users/ghazel/projects/parslet_orig@545dbbc1281137c01c826d5afbcd3ca672cf0acb
That is actually a lie -- the path it added does not contain parslet.rb (it should be ../parslet_orig/lib
), so it was loading the installed gem instead, in both cases. New numbers match yours, more or less:
ghazel/parslet
~/projects/parslet-benchmarks$ ./bin/run ../parslet/lib
Using parslet at /Users/ghazel/projects/parslet/lib@fad37f129e38e4cb690e6493cef4a8503e92930d
benchmarks/001-treetop.rb 0.400000 0.030000 0.430000 ( 0.421230)
benchmarks/002-http_parser.rb 2.090000 0.050000 2.140000 ( 2.133370)
benchmarks/003-smalltalk.rb 8.060000 0.180000 8.240000 ( 8.248263)
benchmarks/004-csv.rb 15.860000 0.160000 16.020000 ( 16.041821)
kschiess/parslet
~/projects/parslet-benchmarks$ ./bin/run ../parslet_orig/lib
Using parslet at /Users/ghazel/projects/parslet_orig/lib@545dbbc1281137c01c826d5afbcd3ca672cf0acb
benchmarks/001-treetop.rb 0.360000 0.030000 0.390000 ( 0.394083)
benchmarks/002-http_parser.rb 1.270000 0.030000 1.300000 ( 1.308231)
benchmarks/003-smalltalk.rb 4.720000 0.130000 4.850000 ( 4.858223)
benchmarks/004-csv.rb 11.940000 0.100000 12.040000 ( 12.076694)
Thanks for following up. I would not be offended at all if someone would fork parslet and add left recursion. I would provide support for that endeavor. However, my main goal remains to hunt for speed, and left recursion is counter-productive in that respect. Also, it solves a problem I've never had with parslet (probably because I am the author) - the odds are therefore against a merge.
For me, functionality is more important than speed. That said, how would you refactor left-recursion in this case? https://gist.github.com/0b0b302c2c1271b85b8b
Your case is not hard - to break recursion, a 'word' must occur before any parenthesis. I'd swap priorities around.
Please write to the list for getting this kind of help.
I did, a week ago, without resolution: http://librelist.com/browser//ruby.parslet/2012/8/27/recursion-without-consuming-a-character/
It was easier for me to find this pull request, fork xli's work, update it, and use left recursion support.
Hi,
Based on OMeta's left recursion support algorithm for PEG, I added direct and indirect left recursion support into parslet.
The algorithm need only cache parse result of the Rule, not all kinds of Entities in parslet. So I changed Base to not call Context for caching parse result, instead, just call try directly. Then I added a new subclass of Parslet::Atoms::Entity called Rule so that we can have a specific try logic. Then I added Parslet::Atoms::Rule::Position for apply a rule at a position with Context (cache). The Position class implements the whole algorithm. Please see http://tinlizzie.org/ometa/ and http://www.vpri.org/pdf/tr2008003_experimenting.pdf for the details of the algorithm.
I added 2 specs into parser_spec.rb for proving the direct left recursion and in-direct left recursion works as expected.