jviereck / regjsparser

Parsing the JavaScript's RegExp in JavaScript.
http://www.julianviereck.de/regjsparser/
BSD 2-Clause "Simplified" License
77 stars 20 forks source link

Detect lone or unescaped brackets and braces when the unicode flag is set #100

Closed pygy closed 4 years ago

pygy commented 4 years ago

Hi @jviereck

This fixes https://github.com/mathiasbynens/regexpu-core/issues/36 and thus https://github.com/babel/babel/issues/11138

Edit: force push for code styling reasons.

pygy commented 4 years ago

Actually, wait... This will cause a Regexpocalypse that will not be easy to fix once this is live (because a codemod would rely on Babel, which would catch the update automatically because the way both Babel and regexpu-core rely on semver).

nicolo-ribaudo commented 4 years ago

I don't think that the situation is as tragic as you picture it, for a few reasons:

  1. This is an edge case.
  2. This is a normal bug: if someone relies on the buggy behavior, the burden of avoiding breakages is on them. Otherwise, every bugfix should be considered a breaking change.
  3. This package already had some bugfixes which caused more errors, like https://github.com/jviereck/regjsparser/pull/91, and no one complained
  4. When people will stop transpiling /u (hopefully soon, since it's supported by every browser version with > 1.5% market share), their code will be broken regardless of this PR.
pygy commented 4 years ago

Let's roll with it then :-)

jviereck commented 4 years ago

Thanks @pygy for providing this!

I agree with the arguments outlined by @nicolo-ribaudo and therefore accepted this PR.

jviereck commented 4 years ago

Published new release to npm as v0.6.3 - https://www.npmjs.com/package/regjsparser/v/0.6.3.

Thanks once again @pygy !

jens-duttke commented 4 years ago

Please see issue #101 . This change introduced a bug.

jviereck commented 4 years ago

Thanks for reporting this. I most likely have some time over the weekend to look into this.

@nicolo-ribaudo, @pygy, @jens-duttke Do you think it's better in the meantime to make a new release with the change removed for now?

nicolo-ribaudo commented 4 years ago

I think that it's ok to wait a few days. If you end up not having time to fix it please ping me!

jviereck commented 4 years ago

Digging into this, when reading the JS RegExp gramma, I am not sure why /}/u is supposed to be rejected. That is, I think this should parse as:

...
Atom
PatternCharacter
SourceCharacter

But SourceCharacter has no quantification on the unicode u flag. See the documentation here:

https://www.ecma-international.org/ecma-262/10.0/index.html#prod-PatternCharacter

Can someone give me a pointer what I am missing here?

Once it's clear to me why /}/u is supposed to be rejected, I hope to be able to fix the regression in #101 .