Open erezsh opened 1 year ago
Didn't we previously try something into this direction? Wasn't the problem that is ambiguous what should be ignored on the borders between rules when they have different ignore sets?
I.e. does the %ignore
apply between the rules in the rule, or also before and after the rule?
That's a good point, but I think it's reasonable to say that each rule is surrounded by its own ignore terminals on both sides. And since ignoring something twice is the same as doing so once, having two rules with the same ignore set means you only need to match once. So you end up with:
i1 S1 i1 i2 S2 i2 when i1 != i2
i1 S1 i1 S2 i1 when i1 == i2
and of course the logic chains.
I think the main challenge in implementing this would be that in LR parsers it's not clear which rule we are parsing until we reach a reduce. So if the two rules being considered have different ignore sets, there is no trivial way to resolve this at the rule level.
But I think it should work if we do it at the symbol level.
One option is to have each symbol aware of its ignore set. That can get a little tricky when trying to support things like i1 i2 S2
, but maybe we can just concat the ignores into one regex, and get i1,2 S2
.
Second option is to concat the regexes of the ignores to each terminal directly . The re engine should be able to handle it, and we actually get the added bonus of higher performance, because we're doing less re calls. (I did some experiments with the json parser, and concating the ignore makes it ~20% faster.)
However, it might have some unintended consequences, for example it might complicate the "unless" mechanism we use for keywords, or make it slower since we'll have to match the ignored twice.
Edit: actually for both options, probably the main challenge would be making sure that the "ignore_before" and "ignore_after" sets are minimal, and we're not matching the same regex twice on the borders of rules. But sounds like it should be possible.
Suggestion
The global ignore mechanism is the only thing stopping us from having perfect grammar composition. (in theory, at least)
If we allow each rule to have different ignore terminals, we would be able to import any rule from any grammar.
It can replace the global ignore mechanism, if we rewrite the global %ignore directive as applying to each rule individually.
Optional:
We can also add new syntax for a localize ignore, for example:
We could keep the global ignore mechanism as optional, but I don't see any reason to do so.