Open littledan opened 7 years ago
Some thoughts:
const { a, b, c } = match.groups; // if it's always an object
const { a, b, c } = match.groups || {}; // if it's absent when there's no matches
const { groups: { a, b, c } } = match; // if it's always an object
const { groups: { a, b, c } = {} } = match; // if it's absent when there's no matches
One possibly performant way we could make it always be an object is, instead of making groups
an own data property, making its property descriptor be something like:
{
enumerable: true,
configurable: true,
get() {
if (!this.[[matchedGroups]]) { // internal slot
this.[[matchedGroups]] = {};
}
return this.[[matchedGroups]];
},
set(value) {
Object.defineProperty(this, 'groups', { enumerable: true, configurable: true, value, writable: true });
}
}
However, given that match.groups || {}
or = {}
isn't that problematic, i think it's not that big a deal that it always be an object (but it'd be nice).
From an efficiency point of view, I'd still argue for keeping the groups object as a plain JS object with named captures added as data properties by RegExpBuiltinExec (i.e. the current proposal). The object should be added only when the RegExp contains named captures.
In V8, RegExpResult creation (as well as other RegExp glue code) is now done in CodeStubAssembler, which means we're limited in what we can do without bailing out to (slow) runtime. For instance, we can efficiently construct objects from existing maps, but map manipulations (such as adding accessors at runtime) have to be done in C++.
Additional advantages of using a plain JS object with properties:
var result = /(?<a>.)/u.exec("x");
result[1] = "42";
result.group.a; // Still "x".
Disadvantages:
Implementation-wise, a 'group(name)' getter installed on the result object itself which references an internal name-string map is probably more efficient, but also less convenient for users.
More context on the previous post: this was in response to an offline discussion about different possibilities of exposing named captures on the result object.
Some of these were:
result.group(name)
getter.result.group
object with named getters on its proto, possibly lazily constructed.result.group
Map object.result
itself.result.group
plain JS object with data properties.The previous reply argues for the last option.
At the January 2017 TC39 meeting, we discussed some other properties that we might like to have from the groups
property (cc @bterlson and @michaelficarra). I've talked it over with some of the current and former V8 team (@hashseed, @schuay and @ErikCorryGoogle) and concluded that we should actually stay with the current specification. The proposed changes and responses are:
groups
object only for the groups that are present in the RegExp: The counterargument is that, since numbered groups are all added and initialized to undefined, we should do the same for named groups. What makes named groups different? Further, if we always make all of the groups properties, then there will be a single "hidden class" for the groups object, making subsequent property accesses faster in V8 (I believe other engines do similar optimizations).by adding the groups object unconditionally, we would retroactively change the behavior of existing code that use any regexp. A frozen empty object is weird, because I expect the ordinary groups object to be mutable.
@littledan the resolution from the first bullet implies that all named groups will be present on the object, which makes sense to me. Does the second imply that there will be an unconditional unfrozen groups
object on every match object, or that it will only be created on match objects when the regex had named groups?
That it will only be created on match objects when the RegExp had named groups. The quoted section could be rephrased to, "If we added the groups..."
Sounds good.
Is there any concern over behavior ambiguity where the groups
object, if created, has a __proto__
of Object.prototype
, and where a named group could be of the name "__proto__"
or "toString"
, or otherwise shadowing any of the other built-ins on Object.prototype?
What if a named group property name is attempted to be added to groups
, but the Object.prototype
has been extended with a setter (or writable:false
property descriptor) of that same name?
Wouldn't it be simpler if the groups
object was defined to be a plain dictionary object with __proto__
of null, like Object.create(null)
, to avoid these quirks? Or couldn't it argue more in favor of that above-mentioned option of it being a Map
instance?
I agree about the proto of groups. Actually, it already is null, see https://tc39.github.io/proposal-regexp-named-groups/#sec-regexpbuiltinexec
Raised in various ways by @bterlson and @ljharb at the January 2017 TC39 presentation.