Closed sc07kvm closed 1 year ago
Can you please provide a general summary, how the left recursion handling is implemented.
For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.
It is unlikely that I can describe it better than Guido van Rossum:
Priority support is implemented by saving rules and expressions in the stack. Therefore, at the stage of the new choice, we can check what choice we made last time.
Why do all the generated parsers change even if the flag support-left-recursion is not set to true? Can you please explain the changes.
I tweaked the code a bit, now there are fewer changes. The remaining changes are needed to more conveniently integrate support for left recursion.
Major change:
func (p *parser) parseRule(rule *rule) (any, bool) {
// debug code
// memoize code
// parse rule
// debug code
// memoize code
return val, ok
}
->
func (p *parser) parseRuleWrap(rule *rule) (any, bool) {
// debug code
// choosing the right method
if ... {
val, ok := p.parseRule(rule)
} else if ... {
val, ok := p.parseRuleMemoize(rule)
} else if ... {
...
}
// debug code
return val, ok
}
func (p *parser) parseRule(rule *rule) (any, bool) {
// parse rule
}
func (p *parser) parseRuleMemoize(rule *rule) (any, bool) {
// memoize code
val, ok := p.parseRule(rule)
// memoize code
return val, ok
}
How does this change impact the performance of the generated parsers (optimized and not optimized). A good example for this is the json parser in examples/json.
Performance decreased by ~2% on examples/json
current master
https://github.com/mna/pigeon/tree/b0211796baa821ab4d73004fe458e5ae383b6a76
go test -bench=. -benchmem ./examples/json
goos: darwin
goarch: arm64
pkg: github.com/mna/pigeon/examples/json
BenchmarkPigeonJSONNoMemo-10 157 7561956 ns/op 5003014 B/op 139782 allocs/op
BenchmarkPigeonJSONMemo-10 49 24026827 ns/op 26645190 B/op 184036 allocs/op
BenchmarkPigeonJSONOptimized-10 447 2694039 ns/op 2770617 B/op 69738 allocs/op
BenchmarkPigeonJSONOptimizedGrammar-10 164 7232733 ns/op 4998874 B/op 139647 allocs/op
BenchmarkStdlibJSON-10 8139 138571 ns/op 74200 B/op 1010 allocs/op
PASS
ok github.com/mna/pigeon/examples/json 7.893s
this PR
https://github.com/sc07kvm/pigeon/tree/8a83e813e07d301e49df0453187acc51fa14ecad
go test -bench=. -benchmem ./examples/json
goos: darwin
goarch: arm64
pkg: github.com/mna/pigeon/examples/json
BenchmarkPigeonJSONNoMemo-10 153 7679382 ns/op 5003028 B/op 139781 allocs/op
BenchmarkPigeonJSONMemo-10 45 24330618 ns/op 26642812 B/op 184029 allocs/op
BenchmarkPigeonJSONOptimized-10 439 2711443 ns/op 2770550 B/op 69737 allocs/op
BenchmarkPigeonJSONOptimizedGrammar-10 162 7397897 ns/op 4998891 B/op 139647 allocs/op
BenchmarkStdlibJSON-10 8037 139898 ns/op 74205 B/op 1010 allocs/op
PASS
ok github.com/mna/pigeon/examples/json 8.478s
Can you please check how the documentation would need to be changed (e.g. README.md, doc.go)
Added to doc.go, main.go
For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.
If I understand this correctly, left recursion support is only possible, if memoization is available. Of so, the flags -optimize-parser
and -support-left-recursion
are not compatible since with -optimize-parser
the support for memoization is removed. Is this correct?
@sc07kvm One more idea to gain some insights into the robustness of this implementation as well as coverage of edge cases would be to have two grammars for parsing the same input (e.g. arithmetic expressions), one with left recursion and one without and then use fuzzing to compare the results of the two.
An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.
For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.
If I understand this correctly, left recursion support is only possible, if memoization is available. Of so, the flags
-optimize-parser
and-support-left-recursion
are not compatible since with-optimize-parser
the support for memoization is removed. Is this correct?
Conceptually yes, memoization is needed.
But if the -optimize-parser
, -support-left-recursion
parameters are used and there is left recursion in the grammar, then the code responsible for memoization is added to the parser and memoization is applied only on one rule from each cycle in the grammar.
I added a test for this case: test/left_recursion/optimized
An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.
I need some time to do this
An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.
Added:
@sc07kvm I have not forgotten you nor this PR. I am currently abroad and therefore only seldom online. I will try to have a look again, but I can not promise anything. I am sorry for the inconvenience.
@sc07kvm I am back at work again and I checked this PR again. Thanks for all the work you have put into this. Do you mind to give me a brief summary about the current state of this PR. Do you have some open points you are work on or is this PR ready from you point of view?
@sc07kvm I am back at work again and I checked this PR again. Thanks for all the work you have put into this. Do you mind to give me a brief summary about the current state of this PR. Do you have some open points you are work on or is this PR ready from you point of view?
In my opinion PR is ready. I did everything I wanted on it, there were no unanswered questions on it. The status is as follows - support for left recursion has been implemented, a check for the presence of left recursion has been implemented, tests have been written for the operation of left recursion with all other grammar features.
In my opinion PR is ready. I did everything I wanted on it, there were no unanswered questions on it. The status is as follows - support for left recursion has been implemented, a check for the presence of left recursion has been implemented, tests have been written for the operation of left recursion with all other grammar features.
Thanks for the summary, this all sounds very good. I went through the PR again and this is one of the biggest PR ever for pigeon (at least in the recent time) and therefore I am still a little bit concerned about eventual negative side effects.
In order to get some feedback from the community, I would like to call out to all of you that follow this PR as well as to some power users of pigeon (@mna, @xcoulon, @flowchartsman) with the request to give this PR a spin and report back if for your use-cases everything still works as intended. Thank you very much.
I just successfully tested pigeon
built from this branch with:
Hey @breml , thanks for reaching out, I understand your concerns as this is a huge PR. Thanks to @sc07kvm for the significant contribution! I personally haven't used pigeon in a long time and I'm afraid I don't have any additional use-cases to check outside the test suite.
I think a valid question is whether or not left recursion is a big enough problem to add that complexity in the parser (vs fixing/adjusting the grammar)? I sincerely don't know the answer to that, left recursion has generally not been an issue in my grammars but then again they've been more or less fully under my control. And I haven't followed up on the previous discussions, so apologies if that has already been discussed at length (and no disrespect to the huge effort made to implement this! The answer may very well be that it's worth adding this, and I guess it's the likely answer at this point since there's been a good amount of back and forth from what I can see).
Sorry I cannot be of more help testing this.
Hey @breml , thanks for reaching out, I understand your concerns as this is a huge PR. Thanks to @sc07kvm for the significant contribution! I personally haven't used pigeon in a long time and I'm afraid I don't have any additional use-cases to check outside the test suite.
@mna Thanks for your feedback, this is always appreciated.
I think a valid question is whether or not left recursion is a big enough problem to add that complexity in the parser (vs fixing/adjusting the grammar)? I sincerely don't know the answer to that, left recursion has generally not been an issue in my grammars but then again they've been more or less fully under my control. And I haven't followed up on the previous discussions, so apologies if that has already been discussed at length (and no disrespect to the huge effort made to implement this! The answer may very well be that it's worth adding this, and I guess it's the likely answer at this point since there's been a good amount of back and forth from what I can see).
While it is true, that the grammars can always rearranged such that there are no left recursions, it is also true, that this often comes with the cost, that the grammars become less natural to read and therefore harder to maintain. Therefore adding this complexity to pigeon
actually removes complexity for the users.
As you have mentioned, we already put quite some effort in testing and refining this PR. I am seriously considering to add this, but before doing so, I would like to have a very high confidence, that this does not break existing code. Also maintenance and bug fixing of this code could become an issue due to the increased complexity.
While it is true, that the grammars can always rearranged such that there are no left recursions, it is also true, that this often comes with the cost, that the grammars become less natural to read and therefore harder to maintain. Therefore adding this complexity to pigeon actually removes complexity for the users.
Totally agree with you @breml .
I am seriously considering to add this, but before doing so, I would like to have a very high confidence, that this does not break existing code. Also maintenance and bug fixing of this code could become an issue due to the increased complexity.
Yeah that makes sense to me. The fact that it is generally used as a tool (and not imported as a package) makes it harder to automate some kind of "testing in the wild", as no importers are reported in pkg.go.dev.
One thing I did that resulted in a reasonable number of repos using pigeon
is that github search: https://github.com/search?q=pigeon+-o+language%3AGo+&type=code
This could be a starting point to automate testing the changes on a wide variety of grammars.
tl/dr: tests with left recursion pigeon: ✅ 17 ➖ 13 ❌ 0
Bottom line is, that I was not able to find any blocker due to the changes in pigeon
, which gives me quite some confidence to release this PR.
Today I ran some tests with (popular) packages, that use pigeon. I used this search https://github.com/search?utf8=%E2%9C%93&q=generate+pigeon+peg+path%3A*.go&type=code to find packages, that have the keywords generate
, pigeon
and peg
in one of their .go
files.
Then I basically worked with the following sequence:
git clone
go.mod
is < 1.18
update Go versiongo.mod
file: go mod init && go mod tidy
go test ./...
go generate ./...
go test ./...
This is based on the assumption, that there are some tests, that cover the correct function of the generated parser.
These are the results:
✅ https://github.com/bytesparadise/libasciidoc - worked after updating the Go version in go.mod
for any
➖ https://github.com/kiteco/kiteco-public - failed to execute go test
✅ https://github.com/flowchartsman/aql - worked after manually updating one of the tests, since pigeon returns a new error message (https://github.com/flowchartsman/aql/issues/2)
✅ https://github.com/hashicorp/go-bexpr
✅ https://github.com/frankbraun/asciiart - worked after adding a go.mod
file
✅ https://github.com/owncloud/ocis
✅ https://github.com/Workiva/frugal
✅ https://github.com/mmcloughlin/addchain - worked after updating the Go version in go.mod
for any
➖ https://github.com/pinpt/go-common - failed to execute go test
➖ https://github.com/lanl/QA-Prolog - no tests
✅ https://github.com/samuel/go-thrift - worked after updating the Go version in go.mod
for any
➖ https://github.com/d4l3k/wikigopher - failed to execute go test
on master branch, failed to generate parser with the version from this branch as well as the one from latest master. This is sad, since it would have been an interesting case, since the PEG grammar is > 2400 lines long
➖ https://github.com/pinpt/go-common - failed to execute go test
✅ https://github.com/rigetti/openapi-cli-generator - tests only worked for the parser sub-package, updating the Go version in go.mod
for any
was necessary
➖ https://github.com/lanl/edif2qmasm - no tests
➖ https://github.com/eiffel-community/eiffel-goer - no tests
✅ https://github.com/tcard/queson-go - worked after updating the Go version in go.mod
for any
➖ https://github.com/fermuch/telemathings-analog-parser - failed to execute go test
on master branch
➖ https://github.com/sylvinus/ifql - failed to execute go test ./...
✅ https://github.com/philandstuff/dhall-golang - tests only worked for the parser sub-package, updating the Go version in go.mod
for any
was necessary
✅ https://github.com/jacobsimpson/msh - worked with the new version of pigeon, but the test failed on the original checkout 🤔
✅ https://github.com/symbolicsoft/verifpal
➖ https://github.com/tmc/graphql - failed to execute go test ./...
✅ https://github.com/jacobsimpson/mp3tag - worked after updating the Go version in go.mod
for any
✅ https://github.com/SerenityHellp/thrift_parser_lib - tests only worked for the parser sub-package, updating the Go version in go.mod
for any
was necessary
➖ https://github.com/vivekmurali/km - failed to execute go test ./...
➖ https://github.com/Yelp/opa - failed to execute go test ./...
➖ https://github.com/jacobsimpson/jt - failed to execute go test ./...
✅ https://github.com/mmcloughlin/ec3 - tests only worked for the parser sub-package, updating the Go version in go.mod
for any
was necessary
✅ https://github.com/mmcloughlin/ssarules - worked after updating the Go version in go.mod
for any
Result: ✅ 17 ➖ 13 ❌ 0
Concusion:
For none of the tested packages, where executing the tests has been successful, changing pigeon
to the new version from this branch with the left recursion support, broke the tests available for the respective packages.
For a surprisingly large list of packages, it was not straight forward to just execute the tests.
Bottom line is, that I was not able to find any blocker due to the changes in pigeon
, which gives me quite some confidence to release this PR.
Any other thoughts?
:+1: Agree with the confidence boost that this provides.
🚀 It is merged! Thank you very much @sc07kvm for your great effort to make this happen. ❤️
A new release will follow soon.
This PR is the first step towards the implementation of https://github.com/mna/pigeon/issues/120. Implementation details are peeped here https://github.com/we-like-parsers/pegen_experiments/tree/master/pegen
Fixes: #120 It also fixes https://github.com/mna/pigeon/issues/79