jeffreykegler / libmarpa

Marpa parse engine C library -- STABLE
MIT License
97 stars 11 forks source link

Infinite loop in marpa_t_next #116

Closed dabrahams closed 2 years ago

dabrahams commented 2 years ago

Reproducer is not small, sorry.

jeffreykegler commented 2 years ago

Is it possible I'd be able to reproduce this on my Raspberry Pi? Is everything open source?

dabrahams commented 2 years ago

Yes, everything is open source. Check out the “parser” branch of https://github.com/val-lang/val and run “swift test” in the root.

Sent from my iPhone

On May 30, 2022, at 4:18 AM, Jeffrey Kegler @.***> wrote:

 Is it possible I'd be able to reproduce this on my Raspberry Pi? Is everything open source?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

jeffreykegler commented 2 years ago

That's done, but it tells me I need swift. There is a Debian swift package, but I think it's something else. In any case, you probably have a specific version repo you suggest.

FYI, I'm a total Swift newbie, so expect dumb questions. :-)

dabrahams commented 2 years ago

https://www.swift.org/download/ https://www.swift.org/download/ if you want to do it with Swift. I’m going on vacation for 2 weeks, so won’t be around to answer questions during that time. I would guess you would have an easier time reproducing by translating the grammar/regexps from the gist into Marpa::R2, though.

The regexps are in ICU regex format, and most of them aren’t even used in that example.

On May 30, 2022, at 9:31 AM, Jeffrey Kegler @.***> wrote:

That's done, but it tells me I need swift. There is a Debian swift package, but I think it's something else. In any case, you probably have a specific version repo you suggest.

FYI, I'm a total Swift newbie, so expect dumb questions. :-)

— Reply to this email directly, view it on GitHub https://github.com/jeffreykegler/libmarpa/issues/116#issuecomment-1141332509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKYIMXID4X2XZZCVQTDRDVMTUNTANCNFSM5XJICZLA. You are receiving this because you authored the thread.

dabrahams commented 2 years ago

The token sequence, if you want to avoid the regexps, is:

'type' hspace identifier hspace '{' newlines hspace 'fun' identifier '(' ')' hspace '{' '}' newlines hspace '}' newlines hspace 'extension' hspace identifier hspace '{' newlines hspace 'fun' hspace identifier '(' ')' hspace '{' '}' newlines hspace '}'

jeffreykegler commented 2 years ago

Have a good vacation. It may allow me to catch up some of these issues. :-)

I'll think about the Marpa::R2 translation. Definitely a better developed debugging environment, but the translation might be hard. I notice some new syntax in your BNF,and it is nice-looking, but an obstacle in this context. And there is the disadvantage that failure to duplicate under Marpa::R2 is not really very good evidence of anything.

Sent with Proton Mail secure email.

------- Original Message ------- On Monday, May 30th, 2022 at 1:09 PM, Dave Abrahams @.***> wrote:

https://www.swift.org/download/ https://www.swift.org/download/ if you want to do it with Swift. I’m going on vacation for 2 weeks, so won’t be around to answer questions during that time. I would guess you would have an easier time reproducing by translating the grammar/regexps from the gist into Marpa::R2, though.

The regexps are in ICU regex format, and most of them aren’t even used in that example.

On May 30, 2022, at 9:31 AM, Jeffrey Kegler @.***> wrote:

That's done, but it tells me I need swift. There is a Debian swift package, but I think it's something else. In any case, you probably have a specific version repo you suggest.

FYI, I'm a total Swift newbie, so expect dumb questions. :-)

— Reply to this email directly, view it on GitHub https://github.com/jeffreykegler/libmarpa/issues/116#issuecomment-1141332509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKYIMXID4X2XZZCVQTDRDVMTUNTANCNFSM5XJICZLA. You are receiving this because you authored the thread.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were assigned.Message ID: @.***>

jeffreykegler commented 2 years ago

The token sequence is very very helpful. Is it convenient to do it for issue #115 also?

dabrahams commented 2 years ago

I can try. Note that the two grammars are different.

On May 30, 2022, at 10:56 AM, Jeffrey Kegler @.***> wrote:

dabrahams commented 2 years ago

Note: both token sequences are my best guess; I don’t have time to generate them programmatically right now (getting ready to go away).

On May 30, 2022, at 10:56 AM, Jeffrey Kegler @.***> wrote:

The token sequence is very very helpful. Is it convenient to do it for issue #115 https://github.com/jeffreykegler/libmarpa/issues/115 also?

— Reply to this email directly, view it on GitHub https://github.com/jeffreykegler/libmarpa/issues/116#issuecomment-1141382822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKYIOJ3VJGAX2CZ3UTJ4DVMT6OHANCNFSM5XJICZLA. You are receiving this because you authored the thread.

jeffreykegler commented 2 years ago

I converted the grammar to Marpa::R2's SLIF. There is a problem with it ("unproductive symbols"), which causes warnings, but despite the warnings in R2 the parse goes through without incident. I produced a parse tree, and it looks like what would be expected. Here is the Perl test script: https://github.com/jeffreykegler/Marpa--R2/blob/68c05b0fb4719d571906ebfe6104c2544f00d627/cpan/sl_lib116.t. I tested with both Libmarpa in the current official release, and with the bleeding edge commit, with identical results.

The problem which causes the warning is here: https://github.com/jeffreykegler/Marpa--R2/blob/68c05b0fb4719d571906ebfe6104c2544f00d627/cpan/sl_lib116.t#L503. trait__composition_1__list's only rule is a left-recursive, which makes trait__composition_1__list, and several other symbols which require it on the RHS, unproductive. I've added a comment showing a rule that would fix the warnings.

This recursion might seem to explain the infinite loop, except that Libmarpa excises unproductive rules from the grammar for precisely this reason. If the "unproductivity" propagates back to the start symbol, precomputation fails, but otherwise you can go ahead with what remains of the grammar, which is what I was able to do with Marpa::R2. Why this failed in the reported setup is, I confess, a bit of a mystery to me.

dabrahams commented 2 years ago

Have a good vacation. It may allow me to catch up some of these issues. :-)

Thanks. I was working rather obsessively on my own implementation of this stuff through the plane ride here and here yesterday. Now I am going to rest and leave it alone for a while.

I do see now, why scan should come first and not last (as it does in my implementation).

I'm not very pleased with the way I store Leo items, and was thinking of putting them directly into the same array as Earley items, burning storage for a single symbol per Earley item for the unused transition symbol. Earley items are just 3 Ints right now and could be compressed further to make room for that symbol… or, perhaps better, storing an Earley item's postdot symbol there to improve locality.

Other thoughts about efficiency: I realize I am doing lots of linear searches, but the length of these searches is so short, and in contiguous memory, that I suspect it's hard to improve upon. It seems to me that to maintain strict big-O guarantees it might make sense to introduce a “slow path” that kicks in over some constant size, using algorithms with better complexity bounds.

dabrahams commented 2 years ago

When I get back from break I can probably make you a log of the sequence of marpa_xxxx calls and their parameters

Sent from my iPhone

On Jun 1, 2022, at 6:13 AM, Jeffrey Kegler @.***> wrote:

 I converted the grammar to Marpa::R2's SLIF. There is a problem with it ("unproductive symbols"), which causes warnings, but despite the warnings in R2 the parse goes through without incident. I produced a parse tree, and it looks like what would be expected. Here is the Perl test script: https://github.com/jeffreykegler/Marpa--R2/blob/68c05b0fb4719d571906ebfe6104c2544f00d627/cpan/sl_lib116.t. I tested with both Libmarpa in the current official release, and with the bleeding edge commit, with identical results.

The problem which causes the warning is here: https://github.com/jeffreykegler/Marpa--R2/blob/68c05b0fb4719d571906ebfe6104c2544f00d627/cpan/sl_lib116.t#L503. traitcomposition_1list's only rule is a left-recursive, which makes traitcomposition_1list, and several other symbols which require it on the RHS, unproductive. I've added a comment showing a rule that would fix the warnings.

This recursion might seem to explain the infinite loop, except that Libmarpa excises unproductive rules from the grammar for precisely this reason. If the "unproductivity" propagates back to the start symbol, precomputation fails, but otherwise you can go ahead with what remains of the grammar, which is what I was able to do with Marpa::R2. Why this failed in the reported setup is, I confess, a bit of a mystery to me.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

jeffreykegler commented 2 years ago

@dabrahams I hope you'll be ignoring all my issue updates until you are back from vacation and caught up. None of them will have your feedback on a critical path.

I'd stop making the updates, except they are vital for my own tracking of the issues, and also there may be 3rd parties out there following these.

jeffreykegler commented 2 years ago

Using the more careful approach I used in #115, I re-attempted to duplicate this issue, and again cannot duplicate it. The script is here. It parses to the tree show in https://github.com/jeffreykegler/Marpa--R2/blob/80be1a259e134fd8c6542e6d125ee574a5b35702/cpan/sl_lib116.t#L809-L847.

jeffreykegler commented 2 years ago

Two conjectures: First, I conjecture that this is the same problem that I have identified for #115. See https://github.com/jeffreykegler/libmarpa/issues/115#issuecomment-1152954497 and its preceding comments. I conjecture that this is not actually an infinite loop, but the tree logic rejecting all of an exponentially long series of parses.

Second, the failure to duplicate is due to my not having the right sequence of lexemes. This grammar is ridiculously liberal in what it allows with whitespace, so the lexer can represent the same input string with many different lexeme sequences.

If these two conjectures are right, and my proposed solution to #115 works, that solution should also fix this problem. I plan to continue work on #115.

dabrahams commented 2 years ago

In support of your conjecture that there's an exponentially long series of parses, I had a grammar that was appearing to exhibit this behavior, then upgraded to the latest tested libmarpa, and started getting ambiguity reports rather than apparent hangs.

jeffreykegler commented 2 years ago

@dabrahams Can I read your last comment ( https://github.com/jeffreykegler/libmarpa/issues/116#issuecomment-1155398661) as saying this issue is fixed?

dabrahams commented 2 years ago

Depends how thorough you feel you need to be in validating. That is to say, it's up to you! If you feel it's worthwhile to validate more carefully, I'm happy to to help gather more info. There's no shame in saying it's not worth our time though.

jeffreykegler commented 2 years ago

If you prefer to wait a bit in order to check more carefully that this issue has really gone away, that is a good thing, and I will wait.

dabrahams commented 2 years ago

Again: this is your project. It's up to you to decide whether the issue is addressed, and I don't see how I can do a more careful check by myself. AFAICT it would start with me getting you enough information to reproduce the problem so you can validate that you understand the reasons for the apparent hang and that they are addressed by your patch.

jeffreykegler commented 2 years ago

There is a fix which has eliminated the issue, and the connection is not just "post hoc ergo propter hoc". We have an explanation of why the fix for #115 was expected to fix this issue (a prediction in fact) and several tests for #115.

I have also a proof of completeness and consistency of the new code for detecting or-node cycles, to help us assure ourselves that the fix for #115 does address all issues related to or-node cycle testing, and does not open up any new ones. That is, the proof shows the fix does eliminate all or-node cycles, and does not eliminate any trees that do not contain or-node cycles. (I'll shortly write a comment in #115 pointing to the proof, which is in the CWeb code.)

Absent feedback to the contrary, I will close this issue in a couple of days.

Of course, if this, or similar, issues appear with the new version of Libmarpa, we can reopen this, or open a new issue, as appropriate.

jeffreykegler commented 2 years ago

There an apparent memory issue with the fix to #115: https://github.com/jeffreykegler/libmarpa/issues/115#issuecomment-1155795245. Since two issues seem to be related, I am stopping the "close countdown" on this issue until it is resolved.

jeffreykegler commented 2 years ago

Commit https://github.com/jeffreykegler/libmarpa/commit/25eb1da3a1c7d0d4db029bb0306eec7d42c4e9e2, which is the head of the "tested" branch, has a new fix for this and #115, without the memory issue.

jeffreykegler commented 2 years ago

I never did duplicate this, and I see no reason to do so now if the problem has gone away. There's no guarantee I'd succeed, and the time is, I think, better invested elsewhere.

@dabrahams Has the problem gone away? Any reason to keep this open?

dabrahams commented 2 years ago

The problem doesn't reproduce with a more recent libmarpa and that exact test case.

jeffreykegler commented 2 years ago

Per https://github.com/jeffreykegler/libmarpa/issues/116#issuecomment-1163319357 and https://github.com/jeffreykegler/libmarpa/issues/116#issuecomment-1163305674, and absent feedback to the contrary, I will close this issue after a couple of days.

jeffreykegler commented 2 years ago

Closed as per https://github.com/jeffreykegler/libmarpa/issues/116#issuecomment-1163324099.