rbuckton / proposal-regexp-features

Proposal to investigate additional language features for ECMAScript Regular Expressions
MIT License
20 stars 1 forks source link

Subroutines #6

Open slevithan opened 5 months ago

slevithan commented 5 months ago

The readme includes subroutines as a potential ES regex feature, and describes named subroutines with the Perl syntax (?&name).

IMO the PCRE/Oniguruma/Onigmo syntax \g<name> is much nicer, with its similarity to \k<name> for named backreferences.

Note that there are behavior differences between implementations of regex subroutines, which are described here. The result is that PCRE and Perl subroutines are the most powerful, since they treat subroutines as independent subpatterns, and reset the value of backreferences after exiting a subroutine.

I have a working version of non-recursive subroutines in the regex package using the \g<name> syntax and the superior PCRE/Perl behavior. The readme section just linked to describes the nuances of their behavior and includes several examples.

Also IMO the numbered (unnamed) forms are not needed or desirable, so I skipped them in regex.

oleedd commented 2 months ago

(?&name) corresponds to (?R) and (?1), which are widely known and used. Can you explain what +n and -n mean in \k<n+n> and \k<n-n>? If a capture group, why two n?

slevithan commented 2 months ago

I didn't mention anything about that. Nor does this project's readme.

Aside: I don't think +N or -N relative offsets for backreferences or subroutines are needed or desirable.

oleedd commented 2 months ago

I just thought maybe you know (because you mentioned \k). Oh, that is a mistake (here). It should be \k<n+level>, not \k<n+n>. If I understand correctly, -1 means the previous match and +1 means the second (unlike subroutines).