tc39 / proposal-regexp-x-mode

BSD 3-Clause "New" or "Revised" License
24 stars 2 forks source link

Multiline Regexp by also ignoring linebreaks #5

Open martinheidegger opened 1 year ago

martinheidegger commented 1 year ago

It seems like a low hanging fruit to also ignore linebreaks as characters that are ignored in x mode that would open the door to good multi-line RegExp definitions.

r = /
  [0-9]{1,4} # the first few digits
/x

It maybe good to have an FAQ explainer that explains this absence (or support it)

rbuckton commented 1 year ago

That's probably not feasible. Implementations are adamant that they will not change how https://tc39.es/ecma262/#prod-RegularExpressionLiteral parses. In addition, that would be a breaking syntactic change due to ASI:

a
/
b
/x

Is legal JS, and a multiline RegExp would make this ambiguous or require a fairly complex cover grammar.

Some of these limitations are also being discussed in #1 as it pertains to what is allowed inside of a (?#) comment.

martinheidegger commented 1 year ago

This is a very fascinating reply 🤩 that I think in itself would definitely enrich the readme.

That said I find this proposal a lot less appealing without dedicated support for Multiline regexp Grammar (preferably not conflicting with existing grammar 😅) as I would like IDE support (code highlighting, linting and inline docs) for Multiline regexp. If I have to put it in a string something like this is syntax highlighted.

x = new RegExp(
  '[0-9]{1,4}' // the first few digits
)

Maybe an alternative Syntax like:

x = RegExp/
  [0-9]{1,4} # the first few digits
/m

Could work out?

rbuckton commented 1 year ago

Unfortunately, the RegExp/ syntax also wouldn't work because a valid regexp like /a/ would again be parsed as division: RegExp/a/x.

The most likely scenario is using a template literal. The examples I've made so far use this:

x = new RegExp(String.raw`
  [0-9]{1,4} # the first few digits
`, "x");

The template isn't so bad because ${ is not legal RegExp syntax, though you still have to escape `.

martinheidegger commented 1 year ago

/a/ would again be parsed as division

You lost me there, but maybe my suggestion was poor. I was trying to suggest that [RegExp/] be treated as a keyword. Thinking about it a tad more that would be quite hacky.

The template isn't so bad because ${ is not legal RegExp syntax, though you still have to escape `.

My concern is not about the escaping of content, thought that is annoying, its more about the absence of reliable syntax highlighting.

Rethinking this once more: would it be possible to have Regexp have a multiline helper that Syntax highlighter could treat differently?

x = RegExp.multiline`
  [0-9]{1,4} # the first few digits
`

As I recently noticed that Python IDE's do format similar r templates.

r'[0-9]{1,4}'
saturdaywalkers commented 1 year ago

Perl does

qx"
 [0-9] {1-4} #2  
"x;

qx/ 
 [0-9] 
 {1-4} # the char after the qx is the delimiter
 /x

This proposal has been stalled for a while.

It would really help as regex easily become line noise without spacing. Having linebreaks is pretty important!

Any movement?

tophf commented 1 year ago

Multi-line mode is a must. Currently implemented via transform-modern-regexp plugin that uses re tagged literals:

re`/

  # A regular expression for date.

  (?<year>\d{4})-    # year part of a date
  (?<month>\d{2})-   # month part of a date
  (?<day>\d{2})      # day part of a date

/x`;