Add support left recursion

sc07kvm commented 1 year ago

This PR is the first step towards the implementation of https://github.com/mna/pigeon/issues/120. Implementation details are peeped here https://github.com/we-like-parsers/pegen_experiments/tree/master/pegen

Fixes: #120 It also fixes https://github.com/mna/pigeon/issues/79

sc07kvm commented 1 year ago

Can you please provide a general summary, how the left recursion handling is implemented.

For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.

It is unlikely that I can describe it better than Guido van Rossum:

Priority support is implemented by saving rules and expressions in the stack. Therefore, at the stage of the new choice, we can check what choice we made last time.

sc07kvm commented 1 year ago

Why do all the generated parsers change even if the flag support-left-recursion is not set to true? Can you please explain the changes.

I tweaked the code a bit, now there are fewer changes. The remaining changes are needed to more conveniently integrate support for left recursion.

Major change:

func (p *parser) parseRule(rule *rule) (any, bool) {
    // debug code
    // memoize code
    // parse rule
    // debug code
    // memoize code
    return val, ok
}

->

func (p *parser) parseRuleWrap(rule *rule) (any, bool) {
    // debug code
    // choosing the right method
    if ... {
        val, ok := p.parseRule(rule)
    } else if ... {
        val, ok := p.parseRuleMemoize(rule)
    } else if ... {
        ...
    }
    // debug code
    return val, ok
}

func (p *parser) parseRule(rule *rule) (any, bool) {
    // parse rule
}

func (p *parser) parseRuleMemoize(rule *rule) (any, bool) {
    // memoize code
    val, ok := p.parseRule(rule)
    // memoize code
    return val, ok
}

sc07kvm commented 1 year ago

How does this change impact the performance of the generated parsers (optimized and not optimized). A good example for this is the json parser in examples/json.

Performance decreased by ~2% on examples/json

current master https://github.com/mna/pigeon/tree/b0211796baa821ab4d73004fe458e5ae383b6a76 go test -bench=. -benchmem ./examples/json

goos: darwin
goarch: arm64
pkg: github.com/mna/pigeon/examples/json
BenchmarkPigeonJSONNoMemo-10                         157           7561956 ns/op         5003014 B/op     139782 allocs/op
BenchmarkPigeonJSONMemo-10                            49          24026827 ns/op        26645190 B/op     184036 allocs/op
BenchmarkPigeonJSONOptimized-10                      447           2694039 ns/op         2770617 B/op      69738 allocs/op
BenchmarkPigeonJSONOptimizedGrammar-10               164           7232733 ns/op         4998874 B/op     139647 allocs/op
BenchmarkStdlibJSON-10                              8139            138571 ns/op           74200 B/op       1010 allocs/op
PASS
ok      github.com/mna/pigeon/examples/json     7.893s

this PR https://github.com/sc07kvm/pigeon/tree/8a83e813e07d301e49df0453187acc51fa14ecad go test -bench=. -benchmem ./examples/json

goos: darwin
goarch: arm64
pkg: github.com/mna/pigeon/examples/json
BenchmarkPigeonJSONNoMemo-10                         153           7679382 ns/op         5003028 B/op     139781 allocs/op
BenchmarkPigeonJSONMemo-10                            45          24330618 ns/op        26642812 B/op     184029 allocs/op
BenchmarkPigeonJSONOptimized-10                      439           2711443 ns/op         2770550 B/op      69737 allocs/op
BenchmarkPigeonJSONOptimizedGrammar-10               162           7397897 ns/op         4998891 B/op     139647 allocs/op
BenchmarkStdlibJSON-10                              8037            139898 ns/op           74205 B/op       1010 allocs/op
PASS
ok      github.com/mna/pigeon/examples/json     8.478s

sc07kvm commented 1 year ago

Can you please check how the documentation would need to be changed (e.g. README.md, doc.go)

Added to doc.go, main.go

breml commented 1 year ago

For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.

If I understand this correctly, left recursion support is only possible, if memoization is available. Of so, the flags -optimize-parser and -support-left-recursion are not compatible since with -optimize-parser the support for memoization is removed. Is this correct?

breml commented 1 year ago

@sc07kvm One more idea to gain some insights into the robustness of this implementation as well as coverage of edge cases would be to have two grammars for parsing the same input (e.g. arithmetic expressions), one with left recursion and one without and then use fuzzing to compare the results of the two.

An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.

sc07kvm commented 1 year ago

For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.

If I understand this correctly, left recursion support is only possible, if memoization is available. Of so, the flags -optimize-parser and -support-left-recursion are not compatible since with -optimize-parser the support for memoization is removed. Is this correct?

Conceptually yes, memoization is needed.

But if the -optimize-parser, -support-left-recursion parameters are used and there is left recursion in the grammar, then the code responsible for memoization is added to the parser and memoization is applied only on one rule from each cycle in the grammar.

I added a test for this case: test/left_recursion/optimized

sc07kvm commented 1 year ago

An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.

I need some time to do this

sc07kvm commented 1 year ago

An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.

Added:

./test/left_recursion_state/ - modified ./test/left_recursion/ in the likeness of ./test/state
./test/left_recursion_labeled_failures/ - modified /test/labeled_failures/
./test/left_recursion_thrownrecover/ - modified ./test/thrownrecover/

breml commented 1 year ago

@sc07kvm I have not forgotten you nor this PR. I am currently abroad and therefore only seldom online. I will try to have a look again, but I can not promise anything. I am sorry for the inconvenience.

breml commented 1 year ago

@sc07kvm I am back at work again and I checked this PR again. Thanks for all the work you have put into this. Do you mind to give me a brief summary about the current state of this PR. Do you have some open points you are work on or is this PR ready from you point of view?

sc07kvm commented 1 year ago

@sc07kvm I am back at work again and I checked this PR again. Thanks for all the work you have put into this. Do you mind to give me a brief summary about the current state of this PR. Do you have some open points you are work on or is this PR ready from you point of view?

In my opinion PR is ready. I did everything I wanted on it, there were no unanswered questions on it. The status is as follows - support for left recursion has been implemented, a check for the presence of left recursion has been implemented, tests have been written for the operation of left recursion with all other grammar features.

breml commented 1 year ago

In my opinion PR is ready. I did everything I wanted on it, there were no unanswered questions on it. The status is as follows - support for left recursion has been implemented, a check for the presence of left recursion has been implemented, tests have been written for the operation of left recursion with all other grammar features.

Thanks for the summary, this all sounds very good. I went through the PR again and this is one of the biggest PR ever for pigeon (at least in the recent time) and therefore I am still a little bit concerned about eventual negative side effects.

In order to get some feedback from the community, I would like to call out to all of you that follow this PR as well as to some power users of pigeon (@mna, @xcoulon, @flowchartsman) with the request to give this PR a spin and report back if for your use-cases everything still works as intended. Thank you very much.

breml commented 1 year ago

I just successfully tested pigeon built from this branch with:

https://github.com/breml/logstash-config/blob/1e38599e74d8b0c0bb1eaaced195da9a53484ef9/logstash_config.peg, all test passed
a private tool for log processing, also all test passed

mna commented 1 year ago

Hey @breml , thanks for reaching out, I understand your concerns as this is a huge PR. Thanks to @sc07kvm for the significant contribution! I personally haven't used pigeon in a long time and I'm afraid I don't have any additional use-cases to check outside the test suite.

I think a valid question is whether or not left recursion is a big enough problem to add that complexity in the parser (vs fixing/adjusting the grammar)? I sincerely don't know the answer to that, left recursion has generally not been an issue in my grammars but then again they've been more or less fully under my control. And I haven't followed up on the previous discussions, so apologies if that has already been discussed at length (and no disrespect to the huge effort made to implement this! The answer may very well be that it's worth adding this, and I guess it's the likely answer at this point since there's been a good amount of back and forth from what I can see).

Sorry I cannot be of more help testing this.

breml commented 1 year ago

Hey @breml , thanks for reaching out, I understand your concerns as this is a huge PR. Thanks to @sc07kvm for the significant contribution! I personally haven't used pigeon in a long time and I'm afraid I don't have any additional use-cases to check outside the test suite.

@mna Thanks for your feedback, this is always appreciated.

I think a valid question is whether or not left recursion is a big enough problem to add that complexity in the parser (vs fixing/adjusting the grammar)? I sincerely don't know the answer to that, left recursion has generally not been an issue in my grammars but then again they've been more or less fully under my control. And I haven't followed up on the previous discussions, so apologies if that has already been discussed at length (and no disrespect to the huge effort made to implement this! The answer may very well be that it's worth adding this, and I guess it's the likely answer at this point since there's been a good amount of back and forth from what I can see).

While it is true, that the grammars can always rearranged such that there are no left recursions, it is also true, that this often comes with the cost, that the grammars become less natural to read and therefore harder to maintain. Therefore adding this complexity to pigeon actually removes complexity for the users.

As you have mentioned, we already put quite some effort in testing and refining this PR. I am seriously considering to add this, but before doing so, I would like to have a very high confidence, that this does not break existing code. Also maintenance and bug fixing of this code could become an issue due to the increased complexity.

mna commented 1 year ago

While it is true, that the grammars can always rearranged such that there are no left recursions, it is also true, that this often comes with the cost, that the grammars become less natural to read and therefore harder to maintain. Therefore adding this complexity to pigeon actually removes complexity for the users.

Totally agree with you @breml .

I am seriously considering to add this, but before doing so, I would like to have a very high confidence, that this does not break existing code. Also maintenance and bug fixing of this code could become an issue due to the increased complexity.

Yeah that makes sense to me. The fact that it is generally used as a tool (and not imported as a package) makes it harder to automate some kind of "testing in the wild", as no importers are reported in pkg.go.dev.

One thing I did that resulted in a reasonable number of repos using pigeon is that github search: https://github.com/search?q=pigeon+-o+language%3AGo+&type=code

This could be a starting point to automate testing the changes on a wide variety of grammars.

breml commented 1 year ago

tl/dr: tests with left recursion pigeon: ✅ 17 ➖ 13 ❌ 0 Bottom line is, that I was not able to find any blocker due to the changes in pigeon, which gives me quite some confidence to release this PR.

Today I ran some tests with (popular) packages, that use pigeon. I used this search https://github.com/search?utf8=%E2%9C%93&q=generate+pigeon+peg+path%3A*.go&type=code to find packages, that have the keywords generate, pigeon and peg in one of their .go files.

Then I basically worked with the following sequence:

git clone
if Go version in go.mod is < 1.18 update Go version
if no go.mod file: go mod init && go mod tidy
go test ./...
go generate ./...
Verify changed file(s) are parsers
go test ./...

This is based on the assumption, that there are some tests, that cover the correct function of the generated parser.

These are the results:

✅ https://github.com/bytesparadise/libasciidoc - worked after updating the Go version in go.mod for any ➖ https://github.com/kiteco/kiteco-public - failed to execute go test ✅ https://github.com/flowchartsman/aql - worked after manually updating one of the tests, since pigeon returns a new error message (https://github.com/flowchartsman/aql/issues/2) ✅ https://github.com/hashicorp/go-bexpr ✅ https://github.com/frankbraun/asciiart - worked after adding a go.mod file ✅ https://github.com/owncloud/ocis ✅ https://github.com/Workiva/frugal ✅ https://github.com/mmcloughlin/addchain - worked after updating the Go version in go.mod for any ➖ https://github.com/pinpt/go-common - failed to execute go test ➖ https://github.com/lanl/QA-Prolog - no tests ✅ https://github.com/samuel/go-thrift - worked after updating the Go version in go.mod for any ➖ https://github.com/d4l3k/wikigopher - failed to execute go test on master branch, failed to generate parser with the version from this branch as well as the one from latest master. This is sad, since it would have been an interesting case, since the PEG grammar is > 2400 lines long ➖ https://github.com/pinpt/go-common - failed to execute go test ✅ https://github.com/rigetti/openapi-cli-generator - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary ➖ https://github.com/lanl/edif2qmasm - no tests ➖ https://github.com/eiffel-community/eiffel-goer - no tests ✅ https://github.com/tcard/queson-go - worked after updating the Go version in go.mod for any ➖ https://github.com/fermuch/telemathings-analog-parser - failed to execute go test on master branch ➖ https://github.com/sylvinus/ifql - failed to execute go test ./... ✅ https://github.com/philandstuff/dhall-golang - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary ✅ https://github.com/jacobsimpson/msh - worked with the new version of pigeon, but the test failed on the original checkout 🤔 ✅ https://github.com/symbolicsoft/verifpal ➖ https://github.com/tmc/graphql - failed to execute go test ./... ✅ https://github.com/jacobsimpson/mp3tag - worked after updating the Go version in go.mod for any ✅ https://github.com/SerenityHellp/thrift_parser_lib - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary ➖ https://github.com/vivekmurali/km - failed to execute go test ./... ➖ https://github.com/Yelp/opa - failed to execute go test ./... ➖ https://github.com/jacobsimpson/jt - failed to execute go test ./... ✅ https://github.com/mmcloughlin/ec3 - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary ✅ https://github.com/mmcloughlin/ssarules - worked after updating the Go version in go.mod for any

Result: ✅ 17 ➖ 13 ❌ 0

Concusion: For none of the tested packages, where executing the tests has been successful, changing pigeon to the new version from this branch with the left recursion support, broke the tests available for the respective packages. For a surprisingly large list of packages, it was not straight forward to just execute the tests. Bottom line is, that I was not able to find any blocker due to the changes in pigeon, which gives me quite some confidence to release this PR.

Any other thoughts?

mna commented 1 year ago

:+1: Agree with the confidence boost that this provides.

breml commented 1 year ago

🚀 It is merged! Thank you very much @sc07kvm for your great effort to make this happen. ❤️

A new release will follow soon.

mna / pigeon

Add support left recursion #123