ianh / owl

A parser generator for visibly pushdown languages.
MIT License
741 stars 22 forks source link

syntactic sugar for 'separated by' #9

Closed modernserf closed 5 years ago

modernserf commented 6 years ago

I find myself writing some variation of RuleName ? ("," RuleName)* repeatedly, even in simple grammars. This is not a tremendous hardship, but it would be more elegant to have a "separated by" operator, similar to those that come in a lot of parser combinator libraries. For example:

FnDefinition = "function" identifier "(" identifier ? ("," identifier)* ")" Block

would instead be:

FnDefinition = "function" identifier "(" identifier % ","  ")" Block

Again, I recognize that this is merely syntactic sugar, but the + and ? operators are also merely syntactic sugar, but they reduce repetition and increase readability.

ianh commented 6 years ago

Yeah, this is something I've noticed too. The main thing holding me back was readability—if you're used to regular expressions, identifier % "," won't mean anything to you. I couldn't come up with a syntax I was happy with.

But now that I think about it, we could just use the word itself. Something like

FnDefinition = "function" identifier "(" identifier .separated-by "," ")" Block

It's not much shorter than identifier ? ("," identifier)* to write, but it avoids the repetition while increasing readability even more than the % operator would.

WalkerCodeRanger commented 5 years ago

Just wanted to toss out another possible syntax. When I thought about writing a parser generator, I considered providing regex style limited repeat. (i.e. identifier{n} means repeat exactly n times, identifier{n,m} repeat n to m times, identifier{n, } repeat n or more times). Assuming you support that (or even if you don't), there is a straight forward modification for repeate separated by. identifier{","} would mean repeat separated by commas. It could be mixed with numeric limits so that identifier{",", 1} would be at least one identifier separated by commas.

ianh commented 5 years ago

Combining this with regex-style repeat does make a lot of sense. I think for consistency you'd want identifier{",", 1} to be exactly one identifier and identifier{",", 1, } to be 1+.

WalkerCodeRanger commented 5 years ago

You're right. My plan was to use the syntax you suggest, but I messed it up when I wrote my comment.

ianh commented 5 years ago

After playing with it for a while, I think I'm going to do this but with a slightly different syntax: identifier{3} for exactly three, identifier{3-5} for three to five, and identifier{3+} for three or more. The comma example then becomes identifier{",", 1+}, which I find much easier to read.

ianh commented 5 years ago

Never mind; using - as a keyword disables dashes in identifiers. The identifier{3, 5} syntax will have to do for ranges.

ianh commented 5 years ago

I just tagged a new version (owl.v4) with support for this. The original example can now be written like:

FnDefinition = "function" identifier "(" identifier{","}  ")" Block
ianh commented 5 years ago

And thanks @WalkerCodeRanger for the suggestion; I'm much happier with this syntax than anything I had come up with before.