Closed michaelficarra closed 5 months ago
What's the argument for doing this, other than the polyfilling thing?
Less RegExp grammar complexity. While I still assert that nobody should be reading the output of RegExp.escape
, these grammar additions apply to all RegExps, which will mean I will have to read (or at least be on the lookout for) escaped ASCII punctuators in any RegExp context. I don't want them if they serve no purpose other than to make it harder for me to mentally parse a RegExp.
I'd prefer to encounter \&
rather than \x26
. At least I have some hope of figuring out what the first one means (i.e., &
, the same as how \-
means -
, etc).
I agree; I would expect developers are quite comfortable with a backslash being a noop for the character, whereas hex escapes would be wildly unfamiliar.
As a lay person, if I may, I've got some questions.
a) As hex
b) As human readable characters
It sounds to me like a one or two line change, with a lookup table or equivalent for current punctuators, is that a fair assessment? Or is considerably more complex to produce one over the other?
I'm in favour of whichever is simpler. I'd be happy if anything that impedes the progress of .escape
is parked for a later date.
I don't think hex escaping is wildly unfamiliar (encodeURI, html special characters) and I agree that \&
feels perfectly readable, if not normal (regex escape sequences).
@oliverfoster this can’t be parked for later; it has to be decided before the feature ships and likely can never be changed in the future.
Spec complexity will likely be about the same with either approach; a line or two of grammar vs a line or two to do the hex escape.
I feel like pollyfill for older browsers is more important, and there can always be a function to translate hex codes into backslash escaped characters
We don't generally make changes to proposals solely due to polyfillability.
Rough consensus was to make this change; I'll do that, and then come back in a future meeting to seek stage 2.7.
Couple comments:
\u
rather than \x
Filed #67. Currently goes with lowercase.
The Encode AO (currently used by encodeURI
and encodeURIComponent
) uses uppercase.
Let hex be the String representation of octet, formatted as an uppercase hexadecimal number.
True, but the base64 proposal uses lowercase, as does Number.prototype.toString.
There's no need to add complexity of single-character identity escapes for every ASCII punctuator. I would prefer escaping using hex escape sequences instead, as discussed in #58. The only argument given against this is that you'd have to copy-paste any RegExp constructed using this function into a RegExp explainer to understand it, but let's be honest, you were going to have to do that anyway. @sophiebits also points out that by not modifying the grammar, we allow this feature to be polyfilled in older browsers.