sindresorhus / eslint-plugin-unicorn

More than 100 powerful ESLint rules
MIT License
4.25k stars 367 forks source link

Rule proposal: `prefer-regexp-code-point-escape` #989

Open sindresorhus opened 3 years ago

sindresorhus commented 3 years ago

Unicode code point escapes are new in ES6. They support more bits than older escapes and it's better to always use them for consistency, even when they're not required.

This overlaps with the no-hex-escapes rule. Not sure whether we should deprecate that one. It's plausible that someone wants to prevent Hex escapes, but not prefer code point escapes. Opinions welcome.

Inspired by https://github.com/eslint/eslint/issues/12488.

Fail

const foo = '\123'; // Octal
const foo = '\cA'; // Control escape sequence
const foo = '\x7A'; // Hex
const foo = '\u2661'; // Unicode escape sequence
const foo = '\uD83D\uDCA9'; // Unicode surrogate pair

Pass

const foo = '\u{7A}';
const foo = '\u{1F4A9}';
fisker commented 3 years ago

I was going to make a proposal on this the other day, I was thinking merge no-hex-escapes into the new one.

And I prefer const foo = '\u007A'; over const foo = '\u{7A}';

fisker commented 3 years ago

About the name, we already have better-regex, let's use better-string?

papb commented 3 years ago

Nice idea, I would like this rule. I think it makes sense to deprecate no-hex-escapes. I don't like the better-string name though, I think it's too vague. What about better-string-escapes?

sindresorhus commented 3 years ago

And I prefer const foo = '\u007A'; over const foo = '\u{7A}';

Did you see my arguments for why \u{7A} is better? It's shorter and it lets you use the same syntax always.

It also makes escapes stand out more because of the braces, which makes strings with a lot of escapes more readable.

sindresorhus commented 3 years ago

About the name, we already have better-regex, let's use better-string?

I forgot to mention, it should apply to regexes too. I think it's better to have an explicit name for exactly what it does.

sindresorhus commented 3 years ago

This overlaps with the no-hex-escapes rule. Not sure whether we should deprecate that one. It's plausible that someone wants to prevent Hex escapes, but not prefer code point escapes. Opinions welcome.

I think this rule should also handle Hex escapes.

sindresorhus commented 3 years ago

This is now accepted.

yvele commented 3 months ago

And I prefer const foo = '\u007A'; over const foo = '\u{7A}';

Did you see my arguments for why \u{7A} is better? It's shorter and it lets you use the same syntax always.

It also makes escapes stand out more because of the braces, which makes strings with a lot of escapes more readable.

@fisker @sindresorhus

I don't really agree because Unicode is notated using U+0000 with at least 4 digits.

[...] notated according to the standard as U+0000–U+10FFFF

Look at the notation used on various RFC and Wikipedia articles

🚨 I would also enforce \u{…} wrapper as "\u{007A}A" is MUCH more readable than "\u007AA" !

Fail

const foo = "\u7A";
const foo = "\u007A";

Pass

const foo = "\u{007A}";
const foo = "\u{10FFFD}";

Edit: Also for consistency with RegExp notation that requires 4 digits:

const regex = /\u007A/;   ✅ 
const regex = /\u7A/;     ❌  
fisker commented 3 months ago

Sure, PR welcome.

yvele commented 3 months ago

Sure, PR welcome.

Hum.. I think I'll give it a try

About the name, we already have better-regex, let's use better-string?

I forgot to mention, it should apply to regexes too. I think it's better to have an explicit name for exactly what it does.

About the rule name what about prefer-string-unicode-wrapper?

RegEx only support Unicode wrapper with u or v flag:

/\u0061/.test("a");    // true
/\u{0061}/.test("a");  // false!
/\u{0061}/u.test("a"); // true
/\u{0061}/v.test("a"); // true

Note also that

/\u{61}/u.test("a"); // true
/\u{61}/v.test("a"); // true
/\u61/.test("a");    // false!
/\u61/u.test("a");   // Uncaught SyntaxError: Invalid regular expression: /\u61/u: Invalid Unicode escape
/\u61/v.test("a");   // Uncaught SyntaxError: Invalid regular expression: /\u61/u: Invalid Unicode escape

See also https://eslint.org/docs/latest/rules/require-unicode-regexp rule that enforces either u or v flag to be used on RegExp.

If we extend the rule to RegExp we also have to make sure at least u flag is used during autofix

/\u0061/ --> /\u{0061}/u

@sindresorhus should we only focus on string? Or make a single Unicode wrapper enforcement rule for both strings and regexes? That will add complexity to the rule.

For both strings and regexes: prefer-unicode-wrapper ?

sindresorhus commented 3 months ago

If we extend the rule to RegExp we also have to make sure at least u flag is used during autofix

👍

should we only focus on string? Or make a single Unicode wrapper enforcement rule for both strings and regexes? That will make the rule more complex to code.

Both

For both strings and regexes: prefer-unicode-wrapper ?

Maybe prefer-unicode-code-point-escapes? To be explicit. https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point