tc39 / proposal-regexp-named-groups

Named capture groups for JavaScript RegExps
https://tc39.github.io/proposal-regexp-named-groups/
222 stars 21 forks source link

Is the reparsing necessary? #26

Closed dead-claudia closed 6 years ago

dead-claudia commented 7 years ago

Given the second part of the early error section, I'm not convinced the double parsing is actually useful at all. It really seems redundant from a spec point of view, since if there are no GroupSpecifiers defined, no {k GroupName} production can exist without GroupName referencing an invalid specifier (and thus triggering an early error).

littledan commented 7 years ago

Reparsing is a useful spec device because, if you're using lookbehind, a \k may precede the first GroupSpecifier. In real implementations, though, actual reparsing can be rare (it exists just for these sorts of cases). How else would you prefer to specify it?

dead-claudia commented 7 years ago

I would probably instead just remove the N subscript parameter throughout the grammar, and instead rely only on the relevant early error rule. I would just clarify in that rule that the matching GroupSpecifier may occur after the production within the same enclosing RegExp pattern.

littledan commented 7 years ago

How would you then handle allowing \k to expand into k in a RegExp which has no GroupSpecifier? This is the entire point of the reparsing.

dead-claudia commented 7 years ago

What the early error condition states now should be sufficient AFAIK (relevant part bolded):

  • It is a Syntax Error if the enclosing RegExp does not contain a GroupSpecifier with an enclosed RegExpIdentifierName whose StringValue equals the StringValue of the RegExpIdentifierName of this production's GroupName.

If no GroupSpecifier exists, you can't have one with some particular quality. This is my reasoning.

littledan commented 7 years ago

I have no idea what you mean. How would you handle the various interpretations of \k?

msaboff commented 7 years ago

I'm in agreement with @isiahmeadows. Seems to me that the early error condition is sufficient to determine whether \k should be treated as a literal or that it is the beginning of a named back reference.

littledan commented 7 years ago

@msaboff This is sort of a spec-internal constraint, but in general, the JS grammar tries to avoid ambiguity. It doesn't backtrack when an early error happens and try re-parsing \k as a k. Instead, either the \k escape is mandatory or prohibited.

The exception to the no-ambiguity principle is Annex B RegExp syntax. But still, early errors don't fit into that. That's only ambiguous and ordered with respect to the syntactic productions, not the early error conditions.

littledan commented 7 years ago

Anyway, for a working implementation, it's fine to use this logic. I think the only time an implementation would really have to backtrack is the obscure case where a named group precedes its definition, e.g., in a lookbehind.