Closed anba closed 6 years ago
I believe we discussed this possibility at some point, and decided to not do it in favor of keeping semantics entirely the same when there are no groups.
Implementers, would this change affect performance of your engines? cc @jgruber @bterlson @msaboff @tschneidereit
Setting aside performance reasons, I would strongly urge inclusion of groups
in every result object, with any other value besides null
or undefined
(or just simply not present), so that we don't have to do ternary/||
checks to access it (or wait for ?.
optional chaining), nor do we have to do destructuring defaults.
Even if it was false
(which has a decent semantic to it), that would mean you could say v.groups.foo
or ({ groups: { foo } } = v
) without fear of runtime failure.
Please don't make this an unnecessary hazard.
@getify Could you mention a use case where you'd really want to read a group of a particular name, but while not knowing what the RegExp is?
3 use cases:
I built a UI for a web app once where we let users pick from (click on) any of various items in a word cloud, and whichever ones selected were adding elements to a list, which would then be processed to make a dynamic regex, that was then applied to filter a result set. The UI also let them type their own word in.
I can imagine, for example, that the preset words add an element that, in the final regex, does a named capture group, whereas any user-typed in words would be in an unnamed group (or vice versa).
It's therefore impossible to predict ahead of time if the regex you're processing with will have named groups or not. You'd like to be able to process, say with destructuring and some defaults, whether or not it has named groups, or falling back to indexed groups, etc.
When the regex is used to produce a result set, and a different piece of code processes the result set from where the result is created:
results
.map( processWithRegex )
.filter( filterOutSomeStuff );
Say processWithRegex(..)
has different conditions where it uses different regexes, some with named groups, some without, and filterOutSomeStuff(..)
simply takes regex-results and checks them for certain contents to decide if they're allowed in the final list or not... at the moment it's running, it has no idea which regex was used to produce it, so it would like to be able to gracefully check for named groups in the result, or fall back to checking indexed groups if not present.
When the regex is specified as an input to a function that will simply use it to process, the processor has relatively little guarantee what kind of regex it receives:
var filterOutDateMonth = (re,monthNum) => list => list.filter( v => {
var res = re.exec(v);
if (res.groups.month == monthNum or res[1] == monthNum) return false;
return true;
} );
pipe(
filterOutDateMonth( /^(?<month>\d{2}/, 10 ),
filterOutDateMonth( /^(\d{1,2}(?=\/))?/, 7 )
)( ["7/19/80", "07-4-521431", "10-31-2017" ] );
Avoiding prototype chain lookups seems sensible to me, and this is indeed what both V8 and JSC implement. I’m in favor of this change. WDYT, @schuay @hashseed @msaboff @tschneidereit @bterlson?
So it would be an own property on regexes with groups, and there'd be a prototype fallback that returned what, an empty object? If so, would it be the same one every time, would it be frozen, etc?
As a developer, I think it'd be nice if there was a way to tell cleanly that no named groups were matched (other than checking for Object.keys() being empty). If groups
was always an owned property on the result object, and it was either an object if there were named groups matched, or false
if none were used/matched, that would give a decent way of safely checking. results.groups
would be falsy if no match, but results.groups.foo
would safely work and also be falsy. The problem with null
or undefined
is that the .foo
property access would be an error.
Edit: Unfortunately, in
is not allowed against false
, so "foo" in results.groups
would still be unsafe.
I agree that avoiding the prototype chain lookups makes sense, yes.
I don't have a particularly strong opinion on whether the value of the property should be undefined
or something else as @getify requests. If the latter, it should probably be an empty null-__proto__
object.
@jandem, please speak up if for whatever reason this wouldn't be good for SpiderMonkey.
@tschneidereit if it's an empty null object, should it be frozen and always the same object? or newly created every time?
@tschneidereit if it's an empty null object, should it be frozen and always the same object? or newly created every time?
I don't have a particularly strong opinion on this. It being the same object would certainly be more performant, but I don't know how much it'd matter, or whether there might be downsides I'm not immediately seeing.
Avoiding prototype chain lookups seems sensible to me, and this is indeed what both V8 and JSC implement. I’m in favor of this change. WDYT, @schuay @hashseed @msaboff @tschneidereit @bterlson?
I'm fine with this spec change.
Concerning @getify's suggestion of a group
being set to false
when none of the groups match, it is somewhat late to make this type of change. This suggestion doesn't provide improvements to the more likely case of some groups matching while others don't. I also don't like the the duality of results.group
depending on access.
The suggestion to always provide group
, even for match results of expressions without named captures would be a breaking change.
For @getify 's concern, we discussed this earlier in https://github.com/tc39/proposal-regexp-named-groups/issues/12 . Would the use cases you mention be satisfied by the proposal for the ?.
operator, if such an operator is used with the groups property?
For @anba's suggestion, it seems reasonable to me. I didn't understand the fast path correctness issue on the first reading--that does add some kinds of motivation, though at the same time, it's not the completely unsolvable--in theory, the RegExp fastpath could be skipped when a groups
property is added (though I am not happy about this requirement to check and resulting performance cliff at all).
To clarify with respect to @mathiasbynens 's comment, @anba 's proposal does not describe V8's current implementation. Actually, this is an edge case where V8 doesn't currently have consistent behavior. You have to somehow trick V8 into taking the slow path to get the spec's behavior if you want groups higher in the prototype chain to show up, as in the following transcript:
d8> Array.prototype.groups = { foo: "bar" }
{foo: "bar"}
d8> print("--abc--".replace(/abc/, "$<foo>"));
--$<foo>--
undefined
d8> RegExp.prototype.x = 1;
1
d8> print("--abc--".replace(/abc/, "$<foo>"));
--bar--
undefined
I would've hoped that we could only look at the groups
property for RegExps which use groups. However, given the generality made to support RegExp subclassing, the only way I could think to make this work is with another flag-like thing, which just seems unnecessarily complicated. So @anba 's suggestion seems decent; I'm just slightly worried that there will be some measurable performance cost to adding this extra property to all existing RegExps.
@schuay @hashseed what do you think?
FWIW, in V8's fast path for RegExp.prototype.@@replace
, we do not actually materialize the matched object, so
Object.prototype.groups = {
foo: "bar"
};
print("--abc--".replace(/(abc)/, "$<foo>"));
prints --$<foo>--
However, in the slow path, we look up the prototype chain as per spec, and print --bar--
. I filed a bug in V8 for this.
Due to this, I think it makes sense to either
1) always create the groups
property.
2) only do a own-property look up for groups
.
I slightly lean towards (2) because (1) affects the behavior of existing regexps that do not use named capture group syntax.
2, is interesting. It'd be new ground--I don't know about another place in the spec that we do an own-property lookup. Should a getter work there?
Option 3: groups
has a null prototype?
Nice find!
I'd be ok with both 1. and 2. as proposed above. 1. would give us the additional advantage of having not having two different RegExpResult shapes (one with, one without the groups property).
Note that with 1., RegExp.p.exec could still be overridden and return result objects of arbitrary shape, leading to a possible prototype lookup later on.
@littledan there are one or two spots that check HasOwnProperty before calling Get, e.g.: https://tc39.github.io/ecma262/#sec-function.prototype.bind.
Option 3: groups has a null prototype?
It already does. But that's a separate topic (prototype-lookup for properties on the groups object vs. the 'groups' property on the RegExpResult).
@schuay with option 1, what value would you expect when there’s no named captures in the regex?
@schuay with option 1, what value would you expect when there’s no named captures in the regex?
My gut feeling would be undefined as suggested by @anba.
@littledan the problem, as I mentioned in my first post of this thread, is that the ?.
operator is much earlier in its stage progression, which means there will likely be a significant gap between named captures and that operator... maybe a year or two. If the two features were held and shipped simultaneously, that'd be no concern, but that seems unlikely.
In practice, my prediction is this will mean the suboptimal solutions (like || {}
) will become entrenched in a lot of code (code which could take many years before it's updated to use ?.
)... that concern is doubly so for code which doesn't actually need these hacks but may suffer them anyway, as this hazard seems similar to others before it that have spread more as cult-myth rather than based on actual justification.
Of course, that also means future optimization hurdles for engines, as those kinds of hacks are harder to statically detect.
I'm just basically saying that it's a hazard/footgun we can reasonably predict, so designing it into the language is unfortunate when we could tweak just a little bit and avoid/minimize it.
Regarding the "breaking change" claims... can someone provide any real example/use-case where code will change/break if 'groups' shows up on all regex result objects (not the regexp objects themselves)? And specifically, if the worry is that code (libraries/etc) annotates regexp results with extra properties and may collide with this new "groups" property, how is that concern unique only to existing code and NOT a hazard if named-group regexes start being passed around into those utilities? Seems like the hazard would be universal (and quite low).
I'm failing to imagine any such legitimate "breaking" case, but I provided 3 real non-imagined counter cases above.
@getify I see the concern with ||
getting used. I'm not so happy about this either. But if ?.
will solve the entire problem, I'm still a little skeptical. Do you recommend that JS programmers refrain from using null or undefined in their libraries, JSON, etc and instead use empty objects for a similar reason? I'm not so concerned about the compatibility aspect of using {}
, more the performance impact and complexity of the solution.
@schuay Oh, I see your point about HasOwnProperty/Get; I was picturing somehow using GetOwnProperty. The two are observably different, and I don't think anyone does the latter pattern, but yes, we could do that. Does anyone have a preference between option 1 and 2 then? If no one has any preference at all, I'll do 1 since it seems very marginally simpler.
@littledan FWIW I am also skeptical of designing things on the assumption that ?.
or something like it will eventually make it into the language. I am not convinced that it will, personally, given how difficult it's been to come up with any kind of consistent semantics for it.
That said, I agree that it would be weird to design this specific API to avoid non-ObjectCoercible values to make it easier for programmers to not worry about whether or not the object exists before trying to read properties of it, since the vast majority of the APIs in JavaScript and the web platform were not designed that way - including notably String.prototype.match
. I don't think it's that much of a hazard: undefined
is just how the language indicates "not present".
@littledan
But if ?. will solve the entire problem, I'm still a little skeptical.
I understand. I wish ?.
was further along, or could be pushed forward faster, to alleviate these kinds of concerns.
But I also think we shouldn't design one feature contingent on another entirely unrelated proposal-feature working perfectly and shipping cleanly. Hard to predict/correlate unrelated paths through this process.
Do you recommend that JS programmers refrain from using null or undefined in their libraries, JSON, etc and instead use empty objects for a similar reason?
I indeed would consider it very sketchy (and discouraged) if someone designed an API where, based on run-time conditions/inputs, an object on the public API was either present or not present.
Imagine for a moment that jQuery had done something like this with their default Ajax settings... like instead of their ajaxSetup(..)
function, if they'd said $.ajaxOptions
is by default undefined
, but if you set it to an object (or call the function), it gets populated with an object holding those defaults. Wouldn't that seem like an entirely strange/suspect/error-prone API design decision?
Think how many jQuery users could potentially trip over trying to make assignments like $.ajaxOptions.method = "POST";
and having those assignments potentially fail? And imagine all the crufty hackery that would have set in (and been in code bases for a decade+), like $.ajaxOptions = ($.ajaxOptions || {}).method = "POST";
, etc. The far more sane/robust thing would be for $.ajaxOptions
to always start off as an empty object.
And my sentiment here is a hundred fold stronger when we're talking about built-in language stuff, which can never change. jQuery could theoretically realize this mistake later and deprecate/fix it. JS can't.
What if function foo() { .. }
made a function who's .prototype
was undefined, and where you had to make/assign it before you could have added to it? Same deal, that would have been a much stronger hazard than just defaulting it to an empty object.
@littledan fwiw, the spec doesn't tend to use GetOwnProperty
, but things like Object.assign
and Object.{keys,values,entries}
and Object.freeze
all look at only own properties, which is effectively the same.
@bakkot
since the vast majority of the APIs in JavaScript and the web platform were not designed that way
Can you give any examples from JS or the web platform, as precedent, where an object is either present or absent on some API/namespace, depending on how the API is used (or other runtime conditions)? Not return values from functions (match(..)
), but actual API members. I've tried to think of such examples, and haven't come up with any.
@getify something could primarily be conditional if it's the return value from a function (including a getter); but sure, arrow functions lack .prototype
while normal functions have it set to an empty object.
@getify
Not return values from functions (
match(..)
), but actual API members.
I'm not sure I understand why this distinction is important. Given that we don't have proper support for returning multiple values and therefore get by with returning objects with multiple properties, a property being present or absent on the single value returned from a function amounts to the same thing as the function returning multiple values one of which may be present or absent.
That is to say, I don't really understand why str.match(r).groups
only conditionally being an object (vs undefined or null) is significantly different from str.match(r)
only conditionally being an object (vs undefined or null).
In any case: the body
property of Response objects produced by fetch
is null
if the Response is opaque (e.g. if it was from a cross-origin no-cors
fetch), and is otherwise a ReadableStream.
Can you give any examples from JS or the web platform, as precedent, where an object is either present or absent on some API/namespace, depending on how the API is used (or other runtime conditions)?
Things which might be either null or another value is so core to the web platform that WebIDL's type system has nullable types built in for that use case. If you want to see some places where it's used, search around in the HTML spec for a question mark. If it's in an IDL block, that's a case that might be null or might be an object or other value, depending on runtime conditions.
I wonder if the following change for RegExpBuiltinExec will make it easier to improve the performance of GetSubstitution for the common case when only normal RegExp objects are used in
RegExp.prototype[@@replace]
.With the current semantics, implementations always need to lookup
"groups"
on%ArrayPrototype%
and%ObjectPrototype%
before using an optimized fast path. For example this test caseprints
"--bar--"
with the current spec proposal.