jviereck / regjsparser

Parsing the JavaScript's RegExp in JavaScript.
http://www.julianviereck.de/regjsparser/
BSD 2-Clause "Simplified" License
77 stars 20 forks source link

Fix regression with using closing brackets inside of class range in non-unicode mode #102

Closed jviereck closed 4 years ago

jviereck commented 4 years ago

This is intended to fix #101, see also #100.

Note: I wrote the code to fix the bug, but I was not able to figure out what's the correct way to do here is according to the spec. If someone has a pointer to the relevant chapter in the spec, that would be great.

jviereck commented 4 years ago

?r @nicolo-ribaudo @pygy

nicolo-ribaudo commented 4 years ago

Note: I wrote the code to fix the bug, but I was not able to figure out what's the correct way to do here is according to the spec. If someone has a pointer to the relevant chapter in the spec, that would be great.

I'm reading the spec, but I have never read the regexp section before :sweat_smile:

cc @mathiasbynens Maybe you know why?

nicolo-ribaudo commented 4 years ago

Ok I know (for https://github.com/jviereck/regjsparser/pull/100#issuecomment-596133571).

Normally /}/ is invalid regardless of the u flag, because PatternCharacter excludes SyntaxCharacter (and thus }).

However, we have Annex B. :grin: When u is not present, a regex term can be an ExtendedAtom, which can be an ExtendedPatternCharacter. ExtendedPatternCharacter is defined as SourceCharacterbut not one of ^$\.*+?()[|, so it doesn't exclude {.

Also, regardless of Annex B, the only characters disallowed inside a CharacterClass is ] (https://www.ecma-international.org/ecma-262/10.0/index.html#prod-ClassAtomNoDash)

jviereck commented 4 years ago

Good catch, not sure about the example. Will check later.

On Sat, Mar 7, 2020, 16:25 Nicolò Ribaudo notifications@github.com wrote:

@nicolo-ribaudo commented on this pull request.

In parser.js https://github.com/jviereck/regjsparser/pull/102#discussion_r389313233:

       bail("unescaped or unmatched closing brace");

}

  • if (_char === ']') {
  • if (!insideClass && _char === ']') {

Doesn't this change prevent this regex from throwing? /[0-]]/u

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jviereck/regjsparser/pull/102?email_source=notifications&email_token=AABOZZ2XHPVXEYCUP7UGL6LRGK3VRA5CNFSM4LDS3SI2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCYM43RI#pullrequestreview-370789829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABOZZ5Z6VTRHDQUUUZCTPTRGK3VRANCNFSM4LDS3SIQ .

jviereck commented 4 years ago

Pushed a new commit. This one parses the ExtendedAtom if !hasUnicodeFlag. Also, this allows to remove the special logic in createCharacter.

It should fix both #100 and #101 with a more generic implementation.

?r @nicolo-ribaudo @pygy