waxeye-org / waxeye

Waxeye is a parser generator based on parsing expression grammars (PEGs). It supports C, Java, JavaScript, Python, Racket, and Ruby.
https://waxeye-org.github.io/waxeye/index.html
Other
235 stars 38 forks source link

Inline non-terminal definitions #124

Open bzaar opened 1 year ago

bzaar commented 1 year ago

I found myself writing bits of grammar that looked like this:

fruit <- apple | orange | banana | kiwi | pear

apple <- :'apple'
orange <- :'orange'
banana <- :'banana'
kiwi <- :'kiwi'
pear <- :'pear'

The (slight) problem with this is whenever you need to add a new kind of fruit, you need to update two lists.

I wish I could write it this way:

fruit <- 
   | apple  <- :'apple'
   | orange <- :'orange'
   | banana <- :'banana'
   | kiwi   <- :'kiwi'
   | pear   <- :'pear'

Just a suggestion. Would save us a bit of typing too.

orlandohill commented 1 year ago

Thanks for suggesting this!

I had a look through your zal2010.waxeye grammar, in addition to the example here.

I think your proposed extension would need some way to mark the end of a nested non-terminal definition. It would probably need to be something other than white-space, so perhaps it could use parentheses or a single character like a semi-colon. That would need to be used to delimit the top-level non-terminal definitions too.

fruit <- 
   | (apple  <- :'apple')
   | (orange <- :'orange')
   | (banana <- :'banana')
   | (kiwi   <- :'kiwi')
   | (pear   <- :'pear')
fruit <- 
   | apple  <- :'apple';
   | orange <- :'orange';
   | banana <- :'banana';
   | kiwi   <- :'kiwi';
   | pear   <- :'pear';

The issue here is readability. I agree that allowing nested non-terminals can result in less typing while writing grammars, but I'd argue that it makes grammars less readable too. It makes the grammar syntax less uniform, and it makes grammar ASTs deeper which probably increases cognitive load during reading.

I'd argue that if a syntactic element is significant enough to have its own AST node type, then it's better grammar writing style to give it a top-level definition.

Having all non-terminals defined at the top level of a grammar file also makes the grammar language itself easier to learn. Adding nested non-terminals introduces the complexity of whether definitions can be infinitely nested, and the question of how nested definitions are scoped.

I'm open to considering this extension further, but there need to be more compelling use cases.