mysticatea / regexpp

The regular expression parser for ECMAScript.
MIT License
153 stars 15 forks source link

`\c` is parsed incorrectly #11

Closed RunDevelopment closed 4 years ago

RunDevelopment commented 4 years ago

RegExpp: v3.1.0 NodeJS: v13.12.0


The following code:

const { RegExpParser } = require("regexpp");

const parser = new RegExpParser();
const ast = parser.parsePattern(/[\c]/.source);
console.log(JSON.stringify(ast, (key, value) => key === "parent" ? null : value, 4));

will output the following

```json { "type": "Pattern", "parent": null, "start": 0, "end": 4, "raw": "[\\c]", "alternatives": [ { "type": "Alternative", "parent": null, "start": 0, "end": 4, "raw": "[\\c]", "elements": [ { "type": "CharacterClass", "parent": null, "start": 0, "end": 4, "raw": "[\\c]", "negate": false, "elements": [ { "type": "Character", "parent": null, "start": 1, "end": 2, "raw": "\\", "value": 92 }, { "type": "Character", "parent": null, "start": 2, "end": 3, "raw": "c", "value": 99 } ] } ] } ] } ```

As you can see, \c is parsed as a backslash character and the character c. This happens both inside and outside of character classes. Instead, it should be parsed as a single character c.

mysticatea commented 4 years ago

Thank you for your report.

But this is intended -- according to the spec.

\c is not an escape sequence: https://tc39.es/ecma262/#prod-annexB-SourceCharacterIdentityEscape

You can see this behavior:

~\dev\sandbox\foo [master]> node
Welcome to Node.js v12.12.0.
Type ".help" for more information.
> /^\c$/.test("\\c")
true
> /^[\c]$/.test("c")
true
> /^[\c]$/.test("\\")
true
RunDevelopment commented 4 years ago

Thank you for the response.

I assumed that it behaves like \x and \u but that's not the case. Very interesting!