tc39 / proposal-regex-escaping

Proposal for investigating RegExp escaping for the ECMAScript standard
http://tc39.es/proposal-regex-escaping/
Creative Commons Zero v1.0 Universal
363 stars 32 forks source link

change escaping to hex escape sequences #65

Closed michaelficarra closed 5 months ago

michaelficarra commented 9 months ago

There's no need to add complexity of single-character identity escapes for every ASCII punctuator. I would prefer escaping using hex escape sequences instead, as discussed in #58. The only argument given against this is that you'd have to copy-paste any RegExp constructed using this function into a RegExp explainer to understand it, but let's be honest, you were going to have to do that anyway. @sophiebits also points out that by not modifying the grammar, we allow this feature to be polyfilled in older browsers.

bakkot commented 9 months ago

What's the argument for doing this, other than the polyfilling thing?

michaelficarra commented 9 months ago

Less RegExp grammar complexity. While I still assert that nobody should be reading the output of RegExp.escape, these grammar additions apply to all RegExps, which will mean I will have to read (or at least be on the lookout for) escaped ASCII punctuators in any RegExp context. I don't want them if they serve no purpose other than to make it harder for me to mentally parse a RegExp.

bakkot commented 9 months ago

I'd prefer to encounter \& rather than \x26. At least I have some hope of figuring out what the first one means (i.e., &, the same as how \- means -, etc).

ljharb commented 9 months ago

I agree; I would expect developers are quite comfortable with a backslash being a noop for the character, whereas hex escapes would be wildly unfamiliar.

oliverfoster commented 9 months ago

As a lay person, if I may, I've got some questions.

Punctuator escaping

a) As hex

b) As human readable characters

Potential additional complexity

It sounds to me like a one or two line change, with a lookup table or equivalent for current punctuators, is that a fair assessment? Or is considerably more complex to produce one over the other?

Other questions

  1. What do other languages do?
  2. What do existing JS implementations do?
  3. What would be the impact of transitioning between styles at a later date?
  4. Does one route prevent or facilitate the other?
  5. Is either route essential?
  6. Could the polyfill produce hex and the standard produce human readable and would the hex be directly equivalent?

Preference

I'm in favour of whichever is simpler. I'd be happy if anything that impedes the progress of .escape is parked for a later date. I don't think hex escaping is wildly unfamiliar (encodeURI, html special characters) and I agree that \& feels perfectly readable, if not normal (regex escape sequences).

ljharb commented 9 months ago

@oliverfoster this can’t be parked for later; it has to be decided before the feature ships and likely can never be changed in the future.

Spec complexity will likely be about the same with either approach; a line or two of grammar vs a line or two to do the hex escape.

DJ-Laser commented 9 months ago

I feel like pollyfill for older browsers is more important, and there can always be a function to translate hex codes into backslash escaped characters

ljharb commented 9 months ago

We don't generally make changes to proposals solely due to polyfillability.

ljharb commented 7 months ago

Rough consensus was to make this change; I'll do that, and then come back in a future meeting to seek stage 2.7.

bakkot commented 5 months ago

Couple comments:

ljharb commented 5 months ago

Filed #67. Currently goes with lowercase.

michaelficarra commented 5 months ago

The Encode AO (currently used by encodeURI and encodeURIComponent) uses uppercase.

Let hex be the String representation of octet, formatted as an uppercase hexadecimal number.

ljharb commented 5 months ago

True, but the base64 proposal uses lowercase, as does Number.prototype.toString.