slevithan / xregexp

Extended JavaScript regular expressions
http://xregexp.com/
MIT License
3.31k stars 278 forks source link

How to reuse a XRegExp reg1 object into another XRegExp that includes sequences of reg1 #258

Closed ibc closed 6 years ago

ibc commented 6 years ago

I've a XRegExp to parse SIP URIs as follows:

const sipUriRegExp = XRegExp(
    `^
    (?P<schema> [sS][iI][pP][sS]?)
    :
    ((?P<user> ([a-zA-Z0-9]|[-_.!~*'()]|%[0-9a-fA-F][0-9a-fA-F]|[&=+$,;?/:])+) @)?
    (?P<host>
        (?P<ipv4>   \\d{1,3}\.\\d{1,3}\.\\d{1,3}\.\\d{1,3}) |
        (?P<ipv6>   \[ [0-9a-fA-F:.]+ \]) |
        (?P<domain> [a-zA-Z0-9.-]+)
    )
    (: (?P<port> \\d{1,5}) )?
    $`,
    'x');

And I want to reuse the above sipUriRegExp within a bigger nameAddrRegExp XRegExp to parse strings that may contain multiple SIP URIs between < and >.

It's unclear to me how to achieve this. I've tried with XRegExp.build:

const nameAddrRegExp = XRegExp.build(
    '(?x)^ (<({{uri}})> [\\s]*)+ $',
    {
        uri  : sipUriRegExp
    });

but when calling XRegExp.exec(string, nameAddrRegExp) it produces an array whose named groups are those in the last SIP URI in the string. I assume this is expected.

May I know which is the way to go to accomplish with my need? Thanks a lot.

slevithan commented 6 years ago

If I'm understanding correctly, I think what you want is something like the following (untested):

if (XRegExp.test(string, nameAddrRegExp)) {
  const uris = XRegExp.match(string, sipUriRegExp, 'all');
}

Note that this would require you to remove the leading ^ and trailing $ in sipUriRegExp for XRegExp.match to return all matches in the string. This change should have no effect on nameAddrRegExp because of the way XRegExp.build already strips leading ^s and trailing unescaped $s in embedded regexes.

ibc commented 6 years ago

Thanks, will try that. Right now I was testing something very ugly (nideed, by also removing ^ an $ in sipUriRegExp):

const uris = '<sip:alice@qwe.com:1234> <sips:bob@asd.net>';

XRegExp.forEach(uris, XRegExp(`<${sipUriRegExp.xregexp.source}> \\s*`, 'x'), (match, i) =>
{
    console.warn(i, ': ', match);
});

But I don't think that using private API (.xregexp.source) is the way to go.

Thanks, will try that.

ibc commented 6 years ago

Mmm, I've tested you proposal, but as the doc says, XRegExp.match returns an array of matched strings, but what I need are an array with named backreference properties (same as exec() returns).

ibc commented 6 years ago

ok, I think that, for my use cases in which I need to reuse lot of grammar, it's much better if I keep a map of regex strings (that contain naming groups) than keeping a map of XRegExp objects since those cannot be later reused into most XRegExp API calls (all but build()).

slevithan commented 6 years ago

Sounds like you can use XRegExp.forEach along with XRegExp.build, and push each match object (or specific backreferences) to an array within the XRegExp.forEach callback. Not sure why <regex>.xregexp.source would be used over XRegExp.build here.

ibc commented 6 years ago

yes, thanks, testing that and it seems the way to go.