jeffreykegler / Marpa-arxiv-paper

The repository of the arxiv.org version of the Marpa paper.
8 stars 1 forks source link

Write paper on Marpa's elimination of LR(0) states. #8

Closed jeffreykegler closed 1 year ago

jeffreykegler commented 2 years ago

I'll respond to https://github.com/jeffreykegler/personal/commit/56fdbaa143cd1a4a7f61d5c6be848afd8c54fa01#r74816581 here, because in this repo I'm preparing a paper that addresses Marpa's history with Aycock and Horspool (A&H) in detail.

The "limitation" is that A&H achieve no speedup in big-O terms. When an improvement does not show up in big-O terms, practical circumstances often minimize it, eliminate it, are even make it counter-productive.

An advantage of Earley's original algorithm is that Earley items are written in terms of rules as the grammar author sees them. Any hacks (and Marpa uses several) to the grammar must be reversed on evaluation. Also, they must be reversed for tracing, debugging and run-time features like the event mechanism.

jeffreykegler commented 2 years ago

Notes toward this paper are here. The notes are not really even a very early draft at this point.

dabrahams commented 2 years ago

Well a big constant factor improvement is still a big improvement. Obviously “hacks” need to be accounted for so evaluation is in terms of the original grammar rules, but doing that for LR rule sets is a solved problem AFAIK, or Yacc, Bison, Lemon, and others wouldn't be usable. 🤷

jeffreykegler commented 2 years ago

One secret of the Earley literature is that it reports speed results based on the count of Earley items -- the fewer Earley items, it is assumed, the faster the parser. This totally ignores evaluation, never mind tracing or run-time features like events.

This massive over-simplification worked, because Jay Earley and Joop Leo only went for results in Landau notation, and these are "big" enough that even a huge over-simplification is not a problem. This is not the case when we are talking about a constant factor.

As an aside, I am the person who has the most practical experience with the A&H states. It turns out the relationship between A&H items and Earley items is not many-to-one, but many-to-many, greatly complicating the translation. IIRC correctly this is not in their paper. They and I ran into it when implementing. I think I read in an interview this made them abandon the implementation effort.

I was more persistent. The many-to-many translation was complex, but I got it working. With this, I was able to measure the A&H states for practical grammars, only to find that their actual effect was quite small. The A&H states did not produce measurable improvements, and were serious obstacles to several run-time features I was contemplating. I took them out of Marpa, and nobody has missed them.

As for the useability of Yacc, etc., this problem of translating back and forth from LR rule sets to the grammar is the most important factor that limited their success. When a Yacc parse fails it is often very hard for the grammar writer to figure out why.

In addition to A&H, one or two other papers report ways of reducing the number of Earley items via NFA/DFA hacks. For the reasons just given, I have not followed up on these, expecting them to be as of purely theoretical interest.

dabrahams commented 2 years ago

That’s strange; the information I get out of Yacc looks just like a Marpa progress report to me.

On May 30, 2022, at 4:48 AM, Jeffrey Kegler @.***> wrote:

As for the useability of Yacc, etc., this problem of translating back and forth from LR rule sets to the grammar is the most important factor that limited their success. When a Yacc parse fails it is often very hard for the grammar writer to figure out why.

dabrahams commented 2 years ago

On May 30, 2022, at 4:48 AM, Jeffrey Kegler @.***> wrote:

The A&H states did not produce measurable improvements,

If that’s really true, perhaps you need to change this in the paper:

Aycock and Horspool realized that, by changing Earley items to track AHFA states instead of individual dotted rules, the size of Earley sets could be reduced, and Earley’s algorithm made faster in practice.

dabrahams commented 2 years ago

Does this mean that nothing in §7 of the Marpa paper applies? The zero-length-rule problem whose solution you say you took from A&H seems to be covered in §3 and §4, IIUC.

In an attempt to understand the paper, btw, I've written this, which attempts to translate §1-6 into a generic implementation. It's not supposed to be efficient (and I haven't run it), but it type-checks and is designed to leave as many guideposts as possible connecting it to the paper. Any comments you have would be appreciated.

jeffreykegler commented 2 years ago

@dabrahams -- Re https://github.com/jeffreykegler/Marpa-arxiv-paper/issues/8#issuecomment-1141365471. Yes, the approach in section §7 is no longer used. In particular, Theorem 7.1, which had been central, is now completely irrelevant. I'd say that section §7 could be ignored, but the AHFA states need to be backed out of many other places in the paper, and it's unfortunately necessary for the reader to skim the discussion of section §7 to be able to do that.

This also responds to https://github.com/jeffreykegler/libmarpa/issues/117#issuecomment-1155924927. I regret the delay, but this got lost in the crush. In future, to put a question on a special "conversational" track, it's best to ask on IRC. Currently issues are likely to be pushed aside in favor of more urgent ones, some of which require days of work. I treat IRC as a separate "track", and will take time from my work on issues to answer questions there, if they can be answered fairly quickly "off the top of my head" or with relatively little research.

jeffreykegler commented 2 years ago

Commit https://github.com/jeffreykegler/Marpa-arxiv-paper/issues/8#issuecomment-1141352820 contains the fix suggested in comment https://github.com/jeffreykegler/Marpa-arxiv-paper/issues/8#issuecomment-1141352820.

jeffreykegler commented 1 year ago

I have created a new repo (https://github.com/Marpa-parser/arxiv-lr0) for work on this paper. Acccordingly, and absent feedback to the contrary, I will close this issue after a couple of days.

jeffreykegler commented 1 year ago

The paper is on arxiv: https://arxiv.org/abs/2303.04093. With this and per https://github.com/jeffreykegler/Marpa-arxiv-paper/issues/8#issuecomment-1407480360, I am closing this issue.

jeffreykegler commented 1 year ago

Oops, on last comment, I clicked "comment" instead of "close with comment". Closed per https://github.com/jeffreykegler/Marpa-arxiv-paper/issues/8#issuecomment-1460282489 and https://github.com/jeffreykegler/Marpa-arxiv-paper/issues/8#issuecomment-1407480360.