Open gonzus opened 4 years ago
I believe that the error recovery code is not entirely deterministic. I need to look at that. Error recovery for scannerless GLR parsing is not well established in the literature so developed my own solution and grabs the first solution, but instead I need to formalize the "best" solution and use that.
Interesting. Where in the code is this? I would like to take a look. Also, any pointers to docs that discuss error recovery would be welcome!
It might also be interesting to add a comment in the code (if it is not there already), and perhaps also in tests/g50.test.g
saying that one might get non-deterministic results.
You could even consider excluding this test entirely, since it makes the test suite fail sometimes? By which I mean, leave the test as a documented bug / todo, or something similar?
This also reminded me of this (admittedly old) article: https://www.ibm.com/developerworks/autonomic/library/ac-alltimeserver/l-cpdpars.html
But I also found some of the parse results a little bit mystifying: Why succeed in debug mode, but not standard mode? Exactly when does ambiguity come up? Developing a grammar with any parsing tool produces similar surprises, but I found DParser's somehow particularly surprising; SimpleParse, for example, surprised me less. Probably if I knew more about the ins-and-outs of the underlying parsing algorithms, it would make more sense; in my relative ignorance, I am probably a lot like 95%+ of my readers, though. There are people who know more about parsing than I do; but in truth, most programmers still know less.
So one obvious question here would be: when a grammar is running into this kind of ambiguity, can the user be informed about it? Is this similar to a reduce/reduce
conflict from olden times? How would I go about finding that my grammar might produce non-deterministic results? Or even better, fixing those non-determinisms?
See the function syntax_error_report_fn.
I just patched it to take care of one type of ambiguity. PTAL
In terms of reporting, the -v flag will report when there are temporary ambiguities (compares) and full ambiguites (ambiguities). You can also add a custom ambiguity resolution function by setting ambiguity_fn. This does not get called for temporary ambiguities, but only when the parser tries to commit a subtree.
Checked out that article. Looks like the instability is no longer present. The author makes some fair points.
I was wrong about the patch. The problem seems to be in error_recover() which despite its efforts seems to be non-deterministic about how much of the parse stack to toss and which symbol to push to get the parser to resynchronize. Unfortunately that bit is complex and novel so I will need to spend some time to look into it.
Thank you for your efforts @jplevyak
https://github.com/jplevyak/dparser/commit/a7d16c945868f9367e33a03199672cef74f7b6ab attempts to address this issue.
Let's see if it crops up again.
I ran the tests with
MAKE=make ./parser_tests
and got this:Ran them again and all tests passed. Then did a
git clean -dfx
and ran them again, and the test failed again; I also saw this error at the beginning of the run (which was not happening before):cp: cannot stat 'D_BUILD_VERSION': No such file or directory
Are these known issues?