jmdyck / es-spec-emu

The ECMAScript 2015 language specification, formatted in ecmarkup (with ecmarkdown and grammarkdown)
9 stars 0 forks source link

emu-prodref #11

Open bterlson opened 9 years ago

bterlson commented 9 years ago

Would be nice to implement emu-prodref. My thinking is that we put the emu-grammar definitions in the annex and then all other occurrences of the identical production are replaced with emu-prodref. That way all the syntax is in one place.

domenic commented 9 years ago

The way other specs seem to work for similar things is that the appendix is compiled out of the full spec. But maybe in this case having all the grammar together is easier to develop on, given the nature of grammars?

bterlson commented 9 years ago

I really like the idea of having the entire grammar in one emu-grammar block at the very end of the spec. This also means that the annex definitions are "primary" for the purposes of linking, which I also think is best. Clicking a non-terminal name brings you to the annex where that production is defined and related productions are located around it. Would make exploring syntax easier, IMO.

domenic commented 9 years ago

I'll trust your judgment as you've done a lot more speccing and reading of the grammar parts than I have. It seems a bit strange but if in practice it's more useful then that's what matters.

How do you go from the annex to the context in which its semantics are defined?

bterlson commented 9 years ago

ctrl+f I guess. I can see "find all references" for x-refs being helpful here, though.

domenic commented 9 years ago

Ah yeah, you want https://resources.whatwg.org/dfn.js (see it in action e.g. by clicking on "list" in https://url.spec.whatwg.org/#concept-urlsearchparams-list)

jmdyck commented 9 years ago

My thinking is that we put the emu-grammar definitions in the annex and then all other occurrences of the identical production are replaced with emu-prodref. That way all the syntax is in one place.

That appears to assume that any syntax outside the annex is identical to some production in the annex, which is not the case in the ES6 spec. For instance, the 'defining' production may contain constructs like [no LineTerminator here] or [lookahead != ...], which might or might not appear in non-defining occurrences of the production. Similarly with grammatical parameters. Probably most significant is that, where the defining production has Foo_opt, a non-defining occurrence of the production can have Foo_opt, Foo, or the absence of a symbol.

(Moreover, I'll point out that grammarkdown doesn't appear to have a way to specify an id for a RHS, but that would presumably be fairly straightforward to fix.)

bterlson commented 9 years ago

For instance, the 'defining' production may contain constructs like [no LineTerminator here] or [lookahead != ...] , which might or might not appear in non-defining occurrences of the production.

I consider this a stylistic choice and is currently handled via CSS styles for inline emu-productions. Adding class=inline to emu-production, emu-prodref, or emu-grammar will currently hide all the parameters. See this example in the Async Functions spec. I currently don't hide annotations and constraints as I find these are still helpful in the inline versions (eg. I was confused with the lack of [+Default] for the export default function non-defining productions) but I can hide them to be consistent with the spec rendering.

Probably most significant is that, where the defining production has Foo_opt , a non-defining occurrence of the production can have Foo_opt , Foo , or the absence of a symbol.

Yes, most significant and I haven't solved it yet. In the async functions spec I expanded the defining production into two productions with and without the optional symbol so I could reference each individually. I don't really like this solution though.

Another option is to not use prodref for those cases and redefine the production with emu-production/emu-grammar, but then the generated IDs would conflict in the current emu (though this could be fixed somehow).

Or we could consider some attribute on emu-prodref that identifies optional symbols to include/exclude, something like <emu-prodref name=FunctionExpression a=a expansion="+BindingIdentifier"></emu-prodref>?

(Moreover, I'll point out that grammarkdown doesn't appear to have a way to specify an id for a RHS, but that would presumably be fairly straightforward to fix.)

It does, syntax is like:

FooNonTerminal ::
    Rhs `1` #a
    Rhs `2` #b

And you can reference these with <emu-prodref name=FooNonTerminal a=a></emu-prodref>

bterlson commented 9 years ago

Ah yeah, you want https://resources.whatwg.org/dfn.js (see it in action e.g. by clicking on "list" in https://url.spec.whatwg.org/#concept-urlsearchparams-list)

Yes, this is nice! Rather than a pop-up I can probably use the side bar. Also seems like having the "find all references" dialog affordance on all references rather than only the definition would be nice. It is also interesting that references are calculated client side - presumably you could compile all this as part of the build process?

domenic commented 9 years ago

Agreed on all except the having it on each reference. From a reference you click the link to get to the dfn and from there one more click to see the reference list.

jmdyck commented 9 years ago

Or we could consider some attribute on emu-prodref that identifies optional symbols to include/exclude, something like <emu-prodref name=FunctionExpression a=a expansion="+BindingIdentifier"></emu-prodref>?

So expansion="+BindingIdentifier" means something like "the occurrence of BindingIdentifier_opt in the defining production is here replaced by BindingIdentifier"? And expansion="-BindingIdentifier" means "... replaced by no symbol"? And the absence of an expansion attribute means it appears as in the defining production? I suppose something like that could work.

You might need to consider how to handle cases where a RHS has more than one occurrence of Foo_opt for the same non-terminal. (E.g., ES6 has such in StringNumericLiteral, IterationStatement, and CaseBlock. But I think every non-defining occurrence of the RHSs in question has Foo_opt in all spots, so [assuming the above interpretation] you wouldn't need an expansion attribute. Still, you might not be so lucky with ES7.)

jmdyck commented 9 years ago

It does, syntax is like:

FooNonTerminal ::
    Rhs `1` #a
    Rhs `2` #b

(I've raised an issue on grammarkdown that this should be documented.)

This syntax for declaring rhs-names doesn't allow declaring them for "one of" productions. E.g., consider

DecimalDigit :: one of
    0 1 2 3 4 5 6 7 8 9

followed by a rule defining e.g., The MV of DecimalDigit :: 0.

Of course, ES7 could just choose to define DecimalDigit as

DecimalDigit ::
  0 #a0
  1 #a1
  2 #a2
  3 #a3
[etc]

which would then allow The MV of <emu-prodref name=DecimalDigit a=a0></emu-prodref>

jmdyck commented 9 years ago

One other thing that occurred to me is that Appendix B defines "alternative" productions for certain nonterminals. Would those be defined in the "one emu-grammar block at the very end of the spec"? If so (or perhaps even if not), <emu-prodref> would need a way to 'select' between two alternative productions for the same nonterminal.

bterlson commented 9 years ago

You might need to consider how to handle cases where a RHS has more than one occurrence of Foo_opt for the same non-terminal

Meh, we can just not use references for this for now if it occurs. Does it?

Re: referencing a one-of RHS, I think grammarkdown should have a convention there (a0,a1,... an seems reasonable). Assume that will work for now I think.

For alternative productions, maybe we should just use a different LHS name? That might be more clear anyway...

jmdyck commented 9 years ago

You might need to consider how to handle cases where a RHS has more than one occurrence of Foo_opt for the same non-terminal

Meh, we can just not use references for this for now if it occurs. Does it?

Yes and (maybe) no. See the parenthetical that follows the sentence that you quoted.

For alternative productions, maybe we should just use a different LHS name? That might be more clear anyway...

No, that won't generally work (or, it'd be more hassle than you think).

For example, 11.8.3 and B.1.1 define different versions of NumericLiteral (the one in B.1.1 having an extra RHS for LegacyOctalIntegerLiteral). If we use a different LHS name in B.1.1 (say, BNumericLiteral), then: (a) No production in the grammar references BNumericLiteral, so there's no way to derive it from the start symbol. (You could make an Annex-B-specific version of the 'Literal' production that does reference BNumericLiteral, but that just pushes the problem 'up' one level.) (b) All the rules involving NumericLiteral (e.g., in 11.8.3.1) that we still need in an "Annex-B world" would need to be copied to use BNumericLiteral instead.

Given that grammatical parameters provide a mechanism for defining variant nonterminals and productions, and also for ignoring those variances when appropriate, it might be feasible to use (something like) that mechanism to define the Annex B variants as part of a single unified grammar. (E.g., consider a parameter B, whose setting is implementation-defined.) However, switching to such a scheme would be non-trivial.

bterlson commented 9 years ago

I think for alternate RHSes, since the first production wins and Annex A is before Annex B, I think I'm OK doing no work here - annex B can redefine the productions using emu-production and emu-prodref will always link to Annex A.

jmdyck commented 9 years ago

Okay. That's why I asked

Would those [Annex B productions] be defined in the "one emu-grammar block at the very end of the spec"?

when I brought it up.

Still, it sounds like when ecmarkup is processing productions in Annex B, every non-terminal link will go to Annex A, even when there's an Annex B definition that it's actually referencing.