ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
4.99k stars 218 forks source link

Overloading parameterized rules? #57

Open blakelapierre opened 8 years ago

blakelapierre commented 8 years ago

Would there be any objection to allowing parameterized rules to be 'overloaded' in Ohm?

I am talking about multiple rules with the same base name, but a differing number of parameters. I'm not sure if there is any 'type' associated with a parameter that could be further disambiguated on, but I imagine that would also be useful, if possible.

Example:

RM {
    Contained<element> = Contained<"{", element, "}">
    Contained<open, element, close> = open element close
}
alexwarth commented 8 years ago

Hi Blake,

The ability to overload parameterized rules is something I've wanted for a while, too. Wouldn't it be nice if there could be a one-parameter variant of ListOf that assumes "," as the separator?

BuiltInRules {
  ...
  ListOf<elem, sep> = ...
  ListOf<elem> = ListOf<elem, ",">
  ...
}

I've already got a plan for how this will manifest in the semantics, and how it will be implemented. I'll do my best to commit it in the next week or two.

Thanks for filing!

Cheers, Alex

blakelapierre commented 8 years ago

Hello Alex!

I think the assumption of a comma as the default ListOf separator makes sense. It would certainly reduce a little bit of clutter in the syntax definition I'm currently working on.

I started taking a look at the code to see how it might be implemented, but I think I'll leave it to you as you've already been thinking about it and know this code base much better than myself :)

Looking forward to it!

alexwarth commented 8 years ago

Sounds good! I'll try to post some details here in the next couple of days, that way you can tell me if there's a better way :)

blakelapierre commented 8 years ago

My guess is that the simplest implementation is just concatenating the parameter arity to the rule name. You may need some 'escaping' mechanism however, so that you don't conflict with a user-defined name. On Jan 31, 2016 9:09 PM, "Alessandro Warth" notifications@github.com wrote:

Sounds good! I'll try to post some details here in the next couple of days, that way you can tell me if there's a better way :)

— Reply to this email directly or view it on GitHub https://github.com/cdglabs/ohm/issues/57#issuecomment-177767136.

alexwarth commented 8 years ago

Right, that's where I was headed, too. My idea was to use Prolog-style names for the rules, e.g.,

Every rule will have this /n suffix. For most rules, it will be /0. You'll never have to write these suffixes in a grammar, because it's clear from the application. But it's a different story on the semantics side. To make it more convenient for folks to write semantic actions, you'll be able to write the name of the rule without the /n so long as it's not ambiguous, i.e., so long as there's only one rule in the grammar with that name. If it is ambiguous, you'll get an error at the time you define the operation / attribute / whatever.

Back to preparing for tomorrow's lecture!!

pdubroy commented 8 years ago

I'm a bit concerned that this will add additional complexity to the API, without providing a whole lot of benefit. The ListOf/2 syntax will complicate writing semantic actions, because the slash is not a valid in a method name in most languages. It's not a problem for the current JavaScript API, but in other host languages it may make sense to write semantic actions as methods in a class.

I'm also not sure how the "so long as it's not ambiguous" would work. Say today that someone has a grammar that uses ListOf/2, and they name the semantic action ListOf. Then I extend that grammar, and add a new ListOf<x> rule. Then, if I want to extend an operation/attribute, I need to write an action for my new ListOf/1 rule. But what if I also want to override the the parent grammar's ListOf/2 rule? Would I name it ListOf (as it is named in the parent operation) or ListOf/2?

Since you can always deal with the lack of ListOf/1 by giving it another name (e.g., CommaSeparatedListOf), I'm not sure the benefit is worth the additional complexity.

alexwarth commented 8 years ago

Hi Pat, you make good points, and we should think about this carefully. First, it's useful to separate what we want -- the ability to provide default values for rule parameters -- from the mechanisms that might support it, e.g., overloading for rules.

I'm also not sure how the "so long as it's not ambiguous" would work. Say today that someone has a grammar that uses ListOf/2, and they name the semantic action ListOf.

This wouldn't be allowed, because it's ambiguous! This is because every grammar inherits from the BuiltInRules grammar, which would have definitions for both ListOf/1 and ListOf/2. So in this case, you would have to write a semantic action for ListOf/2 explicitly.

Then I extend that grammar, and add a new ListOf<x> rule. Then, if I want to extend an operation/attribute, I need to write an action for my new ListOf/1 rule. But what if I also want to override the the parent grammar's ListOf/2 rule? Would I name it ListOf (as it is named in the parent operation) or ListOf/2?

In grammars, you would never have to write the /n. But in the semantic actions, you would have to, for any rule that's been overloaded either in the grammar you are writing, or in one of its supergrammars.

Anyway, I agree that we have to think about this a little more. To be continued.

pdubroy commented 8 years ago

This wouldn't be allowed, because it's ambiguous! This is because every grammar inherits from the BuiltInRules grammar, which would have definitions for both ListOf/1 and ListOf/2. So in this case, you would have to write a semantic action for ListOf/2 explicitly.

But what if we didn't add a new ListOf/1 rule to BuiltInRules?

Forget about ListOf -- say I have the following grammar:

G {
  ...
  quoted<exp, quoteChar> = quoteChar exp quoteChar
}

I write an attribute for this grammar, and since there is only one rule named quoted, I name the semantic action quoted, and not quoted/1.

Then, someone else extends my grammar:

G2 <: G {
  ...
  quoted<exp, openQuote, closeQuote> = openQuote exp closeQuote
}

Now, they want to extend the attribute that I wrote before. Since quoted is now ambiguous, they would have to write an action named quoted/3 for their new rule. But what if they also want to override the handling of the supergrammar's quoted rule? I would think that they'd have to name the action quoted/2, but it seems confusing that the parent attribute has no action named quoted/2 -- only quoted.

Now, not to say that this is a major problem, but I think it could be confusing.

mroeder commented 8 years ago

Since ECMAScript is one of the current major examples for an Ohm grammar and it is a grammar where ES2015 could make use of a lot of rules from ES5 (as currently hinted at in es6.ohm) here is another good use case where some kind of overloading could make sense: ES2015 introduces a Yield parameter in almost all rules (in addition to e.g. Return and Default). The current

PrimaryExpression = this
                  | identifier
                  | literal
                  | ArrayLiteral
                  | ObjectLiteral
                  | "(" Expression<withIn> ")"  -- parenExpr

for example could not be reused as something like this would be necessary:

PrimaryExpression<allowYield> = this
                              | identifier<allowYield>
                              | literal
                              | ArrayLiteral<allowYield>
                              | ObjectLiteral<allowYield>
                              | // ... further extensions

Note: That the grammar used in the spec has an interesting approach to that issue and expands

PrimaryExpression[Yield] :
  this
  IdentifierReference[?Yield]
  ...

to

PrimaryExpression:
  this
  IdentifierReference
  ...

PrimaryExpression_Yield :    // respectively PrimaryExpression<allowYield>
  this
  IdentifierReference_Yield  // respectively IdentifierReference<allowYield>
  ...
alexwarth commented 8 years ago

Hi Pat,

Now, they want to extend the attribute that I wrote before. Since quoted is now ambiguous, they would have to write an action named quoted/3 for their new rule. But what if they also want to override the handling of the supergrammar's quoted rule? I would think that they'd have to name the action quoted/2, but it seems confusing that the parent attribute has no action named quoted/2 -- only quoted.

Now, not to say that this is a major problem, but I think it could be confusing.

Right, it could be a little confusing for now but I'm not super worried about it. (See the last part of my comment.)

In any semantics for G -- the original grammar -- you should be able to write a semantic action for the quoted rule, which takes 2 parameters, using either the name quoted or quoted/2. The former is just a shorthand for the latter, which is the "canonical" name of the rule.

Suppose someone comes along later and writes a grammar G2, which extends G1 and overloads quoted by declaring a rule with that name that takes three arguments. If they want to extend a semantics that was originally written for G and override the semantic action for the two-parameter quoted rule in an operation or an attribute, they would have to use the name quoted/2. (I.e., they wouldn't be able to use the shorthand for that rule.)

Coming back to the confusion thing: these names are only part of the verbose and clunky "assembly language" interface for Ohm grammars that we're exposing and using right now. My hope is that soon enough we'll have an editor for semantic actions, and this clunky interface will become an implementation detail that programmers don't ever have to see. Not only will this avoid confusion, but it will also eliminate some of the portability problems you mentioned earlier.

Cheers, Alex

blakelapierre commented 8 years ago

Sorry to go slightly off-topic, but...

My hope is that soon enough we'll have an editor for semantic actions

Is there more information about the editor anywhere?

alexwarth commented 8 years ago

Is there more information about the editor anywhere?

Hehe, unfortunately not yet. But "soon enough" :)