kjosib / booze-tools

Booze Tools will become the complete programming-language development workbench, all written in Python 3.9 (for now).
MIT License
14 stars 1 forks source link

Detecting associativity/precedence-related ambiguities as a preprocessing step. #51

Open modulovalue opened 5 months ago

modulovalue commented 5 months ago

Hello again @kjosib!

Have you ever tried finding associativity and precedence related ambiguities just from the structure of the CFG?

Assuming there's no hidden recursion, this seems to be doable:

However, to be complete, we'd have to make sure we do consider hidden recursion.

There's also the case where recursion is not necessarily hidden, but just indirect: X -> X _ O ; O -> _ X.

Since you have spent some time on thinking about hidden recursion, I'd love to hear your thoughts on this topic. Have you perhaps tried doing something similar before with booze-tools? Do you know of any prior work trying to do something similar?

kjosib commented 5 months ago

Honestly I hadn't thought about classifying the nature or structure of ambiguities in the manner you describe. I'll tell you three reasons why:

  1. You can run all kinds of ad-hoc analysis to find this or that sort of inadequacy, but the only way to catch them all for sure is to (try to) generate an automaton. (I recognize that not every inadequacy is necessarily the result of an inherently ambiguous grammar, but within the look-ahead associated with the automaton class, it's ambiguous enough to make trouble.)
  2. I don't see what good it does. That is, I don't see how the analysis helps anyone with any non-contrived task I can imagine. (I admit this is an argument from ignorance, so feel free to fill me in.)
  3. It's actually the first I've heard of it.

If an (LR1-) ambiguous grammar most naturally reflects the semantic structure of some language, I'd rather a grammar author be able to use that ambiguous grammar directly rather than forcing them to recast the grammar in different terms.

For designed formal languages, grammars that are hard for computers to parse tend to also be harder for people to deal with. For natural language, there's generally plenty of inherent ambiguity even at the lexical level. These might benefit from a completely different parsing strategy. Personally I have mainly dealt with formal languages, but perhaps some day I'll try something else. (I have a few ideas.)