Rethink Unit-Rule Optimization

kjosib commented 5 years ago

Presently, the LR(0) construction performs the following stunt:

If the shift of a non-terminal would go to a state which only ever immediately performs a unit-reduction, then:

the corresponding state is never allocated, and
the shift is altered to reflect the maximally unit-reduced result.

This effectively pre-computes unit-reductions, which saves work for any parse algorithm and results in fewer states. However, it introduces a non-intuitive special case into the LALR follow-set linking dance, and it is altogether unhelpful with canonical-LR(1).

It might be worth looking at unit-rule elision as a post-processing step on an HFA: If states can self-identify as unit-reducing states, then all references to them can be removed and the states themselves consistently re-numbered.

kjosib commented 5 years ago

Update: The "incompatibility" between using unit-rule elimination in LR(0) phase (to inform canonical-LR(1) about shifts) has been resolved by a small tweak: the BFT catalog gets extra entries for the "skipped" iso-cores pointing to the states they reduce to. That means canonical-LR(1) can find what it needs without needing to perform its own unit-rule elimination step. This issue is certainly not closed, but I now believe it may be feasible to extract the unit-rule optimization (and some other bits) into a sort of "abstract LR algorithm" and over-achieve on this particular issue.

kjosib commented 5 years ago

There's now a proper self-contained object for this purpose. Maybe it can be developed further, but it does the important stuff for right now, and it's plugged into both LR(0) and CLR(1).

kjosib / booze-tools

Rethink Unit-Rule Optimization #13