Consider a utility method for building regular expressions with comments instead of an 'x' flag

tc39 / proposal-regexp-x-mode

BSD 3-Clause "New" or "Revised" License

26 stars 2 forks source link

Consider a utility method for building regular expressions with comments instead of an 'x' flag #2

Open brad4d opened 3 years ago

brad4d commented 3 years ago

Issue #1 points out that there is a grammar controlling regular expression parsing that will require some odd restrictions on even the comments included in the regular expression if we go through with this proposal as it is.

I also noticed that even if this proposal goes through as it is currently defined, the resulting multi-line-with-comments regular expressions you would write would be almost identical in size and readability to regular expressions generated with my example utility function below. Also, when using this utility function there is no need for odd restrictions on the content of comments.

function assembleRegExp(array, flags = undefined) {
  return new RegExp(array.join(''), flags);
}

let wordIntPairRe = assembleRegExp(
    [
      // match from the beginning, but allow leading spaces
      '^\\s*',
      // word captured to a group named "word"
      '(?<word>[a-zA-Z]+)',
      // at least one whitespace
      '\\s+',
      // integer captured to a group named "int"
      '(?<int>[0-9]+)',
      // allow trailing whitespace, but nothing else
      '\\s*$',
    ]);

console.log(wordIntPairRe);
console.log(wordIntPairRe.exec('  abc 123  '));

So, perhaps rather than proposing an 'x' flag in order to get comments, this proposal should define a method similar to the above?

slevithan commented 6 months ago

This pattern only provides (a noisier version of) line comments.

It doesn't enable free spacing without getting unreasonably noisier yet.
- Example: ['(?<alphanum>', '[', '\\w', '--', '_', ']', ')'] instead of simply (?<alphanum> [ \w -- _ ] ).
Things like ['\\c', 'Z'] and ['(', '?:)'] would not be errors if joined as shown, but \c Z and ( ?:) should be errors with flag x (since, based on prior art and reasonable behavior, x should consider whitespace and line comments as "do nothing" rather than "ignore me" operators unless followed by a quantifier).

So this pattern is not a usable substitute for flag x.

rbuckton commented 6 months ago

I'm hoping in the long term that I can bring back the prefix modifiers portion of the RegExp modifiers proposal that was dropped prior to Stage 3 (e.g. /(?i)pattern/) and bring in something like the RegExp.tag template that was originally suggested in https://github.com/tc39/proposal-regex-escaping as a convenient way to support multi-line regular expressions:

const re = RegExp`(?x)
  # match from the beginning, but allow leading spaces
  ^\s*
  # word captured to a group named "word"
  (?<word>[a-zA-Z]+)
  # at least one whitespace
  \s+
  # integer captured to a group named "int"
  (?<int>[0-9]+)
  # allow trailing whitespace, but nothing else
  \s*$
`;