tc39 / proposal-regexp-r-escape

Regular Expression `\R` Escape for ECMAScript
https://tc39.es/proposal-regexp-r-escape
BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

Should \R match \u001bE? #4

Open fstirlitz opened 2 years ago

fstirlitz commented 2 years ago

One of the code points that are supposed to be matched by \R is <NL>, that is U+0085, which is the C1 control code NEXT LINE (NEL). The definition of <NL> is missing from the specification text, but is implied by the contents of the README.

However, C1 control codes have an alternative representation using ASCII code points; U+0085 has an alternative representation as U+001B U+0045, and for example terminal emulators that support the former as a line-ending character tend to also support the latter (e.g. VTE).

$ printf 'qwe\x1bErty\nabc\xc2\x85def\n'
qwe
rty
abc
def

Some, in fact, only support the the latter (e.g. xterm, native Linux console subsystem):

$ printf 'qwe\x1bErty\nabc\xc2\x85def\n'
qwe
rty
abcdef
$ printf 'qwe\x1bErty\nabc\xc2\x85def\n'
qwe
rty
abc◈def

As such U+0085 can be considered equivalent to (or at least no better than) U+001B U+0045, and it is inconsistent to recognise the former, but not the latter. As such, U+001B U+0045 should be included as a recognised line ending sequence.

On the other hand, the inclusion of NEL (in either form) makes the escape not align with ^ and $ in mu mode, despite the claim in the README. So perhaps removing NEL altogether is also an option.

Which is it going to be?