lezer-parser / lezer

Dev utils and issues for the Lezer core packages
33 stars 1 forks source link

Consider alternative approaches to forced reduction #14

Closed marijnh closed 4 months ago

marijnh commented 2 years ago

Storing a single reduction per state in the parser tables is problematic, because a given state might contain inconsistent rules and rule positions, and blindly reducing any of them may not be valid in a given parse that is in the state. Patch https://github.com/lezer-parser/lr/commit/e931d93cada1e54d41d966b6634a61ca8a05fce9 adds a check to avoid corruptions from this, but it might still (in circumstances that are probably quite obscure) produce weirdly structure output trees.

A possible alternative strategy would involve scanning up the stack for a state that has goto entries, and then use one of those to create a reduction. But there is is also tricky to make sure the reduced term actually corresponds to what is being reduced (since the parse tables don't provide us with that information).

A more expensive approach would be to store a set of terms with each state, and use that to pick the right goto entry when scanning up the stack. But again, if applied in the simplest way that doesn't seem to provide 100% guarantee against picking a nonsense reduction, since states don't uniquely determine how they were reached (and might even refer to several positions in the same rule).

In any case this will likely involve a breaking change to the on-disk format, so let's look into it when considering a new breaking release.

marijnh commented 4 months ago

The loop issue seems to be under control with the fixes to https://github.com/lezer-parser/lezer/issues/13 , and potential alternative approaches have their own significant downsides, so I'm going to close this.