tc39 / proposal-regexp-named-groups

Named capture groups for JavaScript RegExps
https://tc39.github.io/proposal-regexp-named-groups/
222 stars 21 forks source link

Investigate need for reflection/genericity over groups #12

Open littledan opened 7 years ago

littledan commented 7 years ago

Raised in various ways by @bterlson and @ljharb at the January 2017 TC39 presentation.

ljharb commented 7 years ago

Some thoughts:

const { a, b, c } = match.groups; // if it's always an object
const { a, b, c } = match.groups || {}; // if it's absent when there's no matches

const { groups: { a, b, c } } = match; // if it's always an object
const { groups: { a, b, c } = {} } = match; // if it's absent when there's no matches

One possibly performant way we could make it always be an object is, instead of making groups an own data property, making its property descriptor be something like:

{
  enumerable: true,
  configurable: true,
  get() {
    if (!this.[[matchedGroups]]) { // internal slot
      this.[[matchedGroups]] = {};
    }
    return this.[[matchedGroups]];
  },
  set(value) {
    Object.defineProperty(this, 'groups', { enumerable: true, configurable: true, value, writable: true });
  }
}

However, given that match.groups || {} or = {} isn't that problematic, i think it's not that big a deal that it always be an object (but it'd be nice).

schuay commented 7 years ago

From an efficiency point of view, I'd still argue for keeping the groups object as a plain JS object with named captures added as data properties by RegExpBuiltinExec (i.e. the current proposal). The object should be added only when the RegExp contains named captures.

In V8, RegExpResult creation (as well as other RegExp glue code) is now done in CodeStubAssembler, which means we're limited in what we can do without bailing out to (slow) runtime. For instance, we can efficiently construct objects from existing maps, but map manipulations (such as adding accessors at runtime) have to be done in C++.

Additional advantages of using a plain JS object with properties:

Disadvantages:

Implementation-wise, a 'group(name)' getter installed on the result object itself which references an internal name-string map is probably more efficient, but also less convenient for users.

schuay commented 7 years ago

More context on the previous post: this was in response to an offline discussion about different possibilities of exposing named captures on the result object.

Some of these were:

The previous reply argues for the last option.

littledan commented 7 years ago

At the January 2017 TC39 meeting, we discussed some other properties that we might like to have from the groups property (cc @bterlson and @michaelficarra). I've talked it over with some of the current and former V8 team (@hashseed, @schuay and @ErikCorryGoogle) and concluded that we should actually stay with the current specification. The proposed changes and responses are:

ljharb commented 7 years ago

@littledan the resolution from the first bullet implies that all named groups will be present on the object, which makes sense to me. Does the second imply that there will be an unconditional unfrozen groups object on every match object, or that it will only be created on match objects when the regex had named groups?

littledan commented 7 years ago

That it will only be created on match objects when the RegExp had named groups. The quoted section could be rephrased to, "If we added the groups..."

ljharb commented 7 years ago

Sounds good.

getify commented 6 years ago

Is there any concern over behavior ambiguity where the groups object, if created, has a __proto__ of Object.prototype, and where a named group could be of the name "__proto__" or "toString", or otherwise shadowing any of the other built-ins on Object.prototype?

What if a named group property name is attempted to be added to groups, but the Object.prototype has been extended with a setter (or writable:false property descriptor) of that same name?

Wouldn't it be simpler if the groups object was defined to be a plain dictionary object with __proto__ of null, like Object.create(null), to avoid these quirks? Or couldn't it argue more in favor of that above-mentioned option of it being a Map instance?

littledan commented 6 years ago

I agree about the proto of groups. Actually, it already is null, see https://tc39.github.io/proposal-regexp-named-groups/#sec-regexpbuiltinexec