Investigate need for reflection/genericity over groups

littledan commented 7 years ago

Raised in various ways by @bterlson and @ljharb at the January 2017 TC39 presentation.

ljharb commented 7 years ago

Some thoughts:

const { a, b, c } = match.groups; // if it's always an object
const { a, b, c } = match.groups || {}; // if it's absent when there's no matches

const { groups: { a, b, c } } = match; // if it's always an object
const { groups: { a, b, c } = {} } = match; // if it's absent when there's no matches

One possibly performant way we could make it always be an object is, instead of making groups an own data property, making its property descriptor be something like:

{
  enumerable: true,
  configurable: true,
  get() {
    if (!this.[[matchedGroups]]) { // internal slot
      this.[[matchedGroups]] = {};
    }
    return this.[[matchedGroups]];
  },
  set(value) {
    Object.defineProperty(this, 'groups', { enumerable: true, configurable: true, value, writable: true });
  }
}

However, given that match.groups || {} or = {} isn't that problematic, i think it's not that big a deal that it always be an object (but it'd be nice).

schuay commented 7 years ago

From an efficiency point of view, I'd still argue for keeping the groups object as a plain JS object with named captures added as data properties by RegExpBuiltinExec (i.e. the current proposal). The object should be added only when the RegExp contains named captures.

In V8, RegExpResult creation (as well as other RegExp glue code) is now done in CodeStubAssembler, which means we're limited in what we can do without bailing out to (slow) runtime. For instance, we can efficiently construct objects from existing maps, but map manipulations (such as adding accessors at runtime) have to be done in C++.

Additional advantages of using a plain JS object with properties:

Avoids messy logic related to modified properties on the result object, e.g.: modifying a numbered capture on the result object never affects named captures on the result:
```
var result = /(?<a>.)/u.exec("x");
result[1] = "42";
result.group.a;  // Still "x".
```
Interacts nicely with destructuring and the replace method.

Disadvantages:

CreateDataProperty currently still requires a slow runtime call. A fast path seems feasible for dictionary mode objects (which the groups object currently is). This would require some effort on V8 implementer side.
Adds unconditional overhead to RegExpBuiltinExec to check whether we need to construct a groups object. Currently this is an object load, a fixed array element load, and a conditional jump. We could probably optimize this to avoid the element load.

Implementation-wise, a 'group(name)' getter installed on the result object itself which references an internal name-string map is probably more efficient, but also less convenient for users.

schuay commented 7 years ago

More context on the previous post: this was in response to an offline discussion about different possibilities of exposing named captures on the result object.

Some of these were:

A result.group(name) getter.
A result.group object with named getters on its proto, possibly lazily constructed.
A result.group Map object.
Storing named capture as data properties on result itself.
A result.group plain JS object with data properties.

The previous reply argues for the last option.

littledan commented 7 years ago

At the January 2017 TC39 meeting, we discussed some other properties that we might like to have from the groups property (cc @bterlson and @michaelficarra). I've talked it over with some of the current and former V8 team (@hashseed, @schuay and @ErikCorryGoogle) and concluded that we should actually stay with the current specification. The proposed changes and responses are:

Creating the keys on the groups object only for the groups that are present in the RegExp: The counterargument is that, since numbered groups are all added and initialized to undefined, we should do the same for named groups. What makes named groups different? Further, if we always make all of the groups properties, then there will be a single "hidden class" for the groups object, making subsequent property accesses faster in V8 (I believe other engines do similar optimizations).
Always making a groups property, which is an empty object (possibly based on a class hierarchy): from @hashseed:

by adding the groups object unconditionally, we would retroactively change the behavior of existing code that use any regexp. A frozen empty object is weird, because I expect the ordinary groups object to be mutable.

ljharb commented 7 years ago

@littledan the resolution from the first bullet implies that all named groups will be present on the object, which makes sense to me. Does the second imply that there will be an unconditional unfrozen groups object on every match object, or that it will only be created on match objects when the regex had named groups?

littledan commented 7 years ago

That it will only be created on match objects when the RegExp had named groups. The quoted section could be rephrased to, "If we added the groups..."

ljharb commented 7 years ago

Sounds good.

getify commented 6 years ago

Is there any concern over behavior ambiguity where the groups object, if created, has a __proto__ of Object.prototype, and where a named group could be of the name "__proto__" or "toString", or otherwise shadowing any of the other built-ins on Object.prototype?

What if a named group property name is attempted to be added to groups, but the Object.prototype has been extended with a setter (or writable:false property descriptor) of that same name?

Wouldn't it be simpler if the groups object was defined to be a plain dictionary object with __proto__ of null, like Object.create(null), to avoid these quirks? Or couldn't it argue more in favor of that above-mentioned option of it being a Map instance?

littledan commented 6 years ago

I agree about the proto of groups. Actually, it already is null, see https://tc39.github.io/proposal-regexp-named-groups/#sec-regexpbuiltinexec

tc39 / proposal-regexp-named-groups

Investigate need for reflection/genericity over groups #12