tc39 / proposal-regexp-named-groups

Named capture groups for JavaScript RegExps
https://tc39.github.io/proposal-regexp-named-groups/
222 stars 21 forks source link

Add offsets of groups to result #21

Closed SebastianZ closed 7 years ago

SebastianZ commented 7 years ago

I've proposed to add the offsets of all capturing groups to the match result, which I think may be incoorporated into this proposal or at least work together with named groups.

The offset should not only be exposed for named capturing groups but also for numbered groups. My initial idea was to let the groups within the groups property hold an object with the matched string and its start and end offset instead of a simple string.

To let this work with unnamed groups as well while staying backwards compatible, unnamed groups should be added to the groups property, too.

Examples: Named capturing groups:

let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
let result = re.exec('2015-01-02');
// result.groups.year.match === '2015';
// result.groups.year.start === 0;
// result.groups.year.end === 4;
// result.groups.month.match === '01';
// result.groups.month.start === 5;
// result.groups.month.end === 7;
// result.groups.day.match === '02';
// result.groups.day.start === 8;
// result.groups.day.end === 10;

Numbered capturing groups:

let re = /(\d{4})-(\d{2})-(\d{2})/u;
let result = re.exec('2015-01-02');
// result.groups[0].match === '2015';
// result.groups[0].start === 0;
// result.groups[0].end === 4;
// result.groups[1].match === '01';
// result.groups[1].start === 5;
// result.groups[1].end === 7;
// result.groups[2].match === '02';
// result.groups[2].start === 8;
// result.groups[2].end === 10;

Sebastian

littledan commented 7 years ago

This is an interesting idea. That's useful-seeming information that implementations (at least V8) calculate internally but isn't exposed in any API that I know of.

The main downside I see of this approach is that now, we're creating a lot of objects for each match result, which could hurt performance, including on existing RegExps. Maybe we actually want a method, rather than properties. The API could look like this:

result.group('year') === '2015';
result.start('year') === 0;
result.end('year') === 4;
result.group(0) === '2015';
result.start(0) === 0;
result.end(0) === 4;

These methods could be part of an object that's on the prototype chain of match objects, which we'd insert into the hierarchy.

I don't know if this idea has decent ergonomics for users' expectations, though.

SebastianZ commented 7 years ago

The main downside I see of this approach is that now, we're creating a lot of objects for each match result, which could hurt performance, including on existing RegExps.

Good point!

Maybe we actually want a method, rather than properties. The API could look like this:

result.group('year') === '2015';
result.start('year') === 0;
result.end('year') === 4;
result.group(0) === '2015';
result.start(0) === 0;
result.end(0) === 4;

Would work, but looks a bit weird at first sight to have to call a function to get those values. What about making them getters? Or the call to result.group() could return the object holding all the info for that one group. (Would still have some preformance impact if you just want to get the matched string, of course.)

Sebastian

matthewrobb commented 7 years ago

Could the group property be a proxy on the prototype that lookups it's keys on access?

littledan commented 7 years ago

I think this is definitely a useful feature. However, how about we leave it out of scope and handle it in a future proposal? This proposal gives named groups, and a follow-on proposal would let you access the start and end indices of both named and numbered groups. I think they separate cleanly that way.

SebastianZ commented 7 years ago

I just wanted to discuss offsets before this API is set in stone. So, if you think it's better to keep them as a separate proposal, it is ok for me. I'll update my proposal then to also cover named groups. Please provide your input there!

Sebastian