change escaping to hex escape sequences

michaelficarra commented 9 months ago

There's no need to add complexity of single-character identity escapes for every ASCII punctuator. I would prefer escaping using hex escape sequences instead, as discussed in #58. The only argument given against this is that you'd have to copy-paste any RegExp constructed using this function into a RegExp explainer to understand it, but let's be honest, you were going to have to do that anyway. @sophiebits also points out that by not modifying the grammar, we allow this feature to be polyfilled in older browsers.

bakkot commented 9 months ago

What's the argument for doing this, other than the polyfilling thing?

michaelficarra commented 9 months ago

Less RegExp grammar complexity. While I still assert that nobody should be reading the output of RegExp.escape, these grammar additions apply to all RegExps, which will mean I will have to read (or at least be on the lookout for) escaped ASCII punctuators in any RegExp context. I don't want them if they serve no purpose other than to make it harder for me to mentally parse a RegExp.

bakkot commented 9 months ago

I'd prefer to encounter \& rather than \x26. At least I have some hope of figuring out what the first one means (i.e., &, the same as how \- means -, etc).

ljharb commented 9 months ago

I agree; I would expect developers are quite comfortable with a backslash being a noop for the character, whereas hex escapes would be wildly unfamiliar.

oliverfoster commented 9 months ago

As a lay person, if I may, I've got some questions.

Punctuator escaping

a) As hex

Polyfillable
Less complex

b) As human readable characters

More easily human readable
Shorter, prettier

Potential additional complexity

It sounds to me like a one or two line change, with a lookup table or equivalent for current punctuators, is that a fair assessment? Or is considerably more complex to produce one over the other?

Preference

I'm in favour of whichever is simpler. I'd be happy if anything that impedes the progress of .escape is parked for a later date. I don't think hex escaping is wildly unfamiliar (encodeURI, html special characters) and I agree that \& feels perfectly readable, if not normal (regex escape sequences).

ljharb commented 9 months ago

@oliverfoster this can’t be parked for later; it has to be decided before the feature ships and likely can never be changed in the future.

Spec complexity will likely be about the same with either approach; a line or two of grammar vs a line or two to do the hex escape.

DJ-Laser commented 9 months ago

I feel like pollyfill for older browsers is more important, and there can always be a function to translate hex codes into backslash escaped characters

ljharb commented 9 months ago

We don't generally make changes to proposals solely due to polyfillability.

ljharb commented 7 months ago

Rough consensus was to make this change; I'll do that, and then come back in a future meeting to seek stage 2.7.

bakkot commented 5 months ago

Couple comments:

uppercase or lowercase?
some whitespace is not ascii and so needs \u rather than \x

ljharb commented 5 months ago

Filed #67. Currently goes with lowercase.

michaelficarra commented 5 months ago

The Encode AO (currently used by encodeURI and encodeURIComponent) uses uppercase.

Let hex be the String representation of octet, formatted as an uppercase hexadecimal number.

ljharb commented 5 months ago

True, but the base64 proposal uses lowercase, as does Number.prototype.toString.

tc39 / proposal-regex-escaping