ridiculousfish / regress

REGex in Rust with EcmaScript Syntax
Apache License 2.0
175 stars 11 forks source link

parse "unbalanced" right square bracket as a literal character #50

Closed selfisekai closed 1 year ago

selfisekai commented 1 year ago

I ran into a related issue in boa_engine: https://github.com/boa-dev/boa/issues/2325

I have no idea how to read the ECMAScript documentation, but the introduced behavior is the behavior of regex engines in V8 and SpiderMonkey.

while this commit doesn't seem to fix it in Boa, I believe this change is needed anyway.

ridiculousfish commented 1 year ago

This is yet another difference between non-Unicode (default) and Unicode (u flag) regular expressions. That is in JS /]/ is valid but /]/u is not.

Regress only implements Unicode regexes (for now), because the non-Unicode semantics only make sense in UCS-2. We've thought about implementing non-Unicode semantics; the annoying part is its treatement of surrogate pairs. That said if we could start implementing non-Unicoode behind a flag; or we could just add this and live with the slight divergence from the spec.

Knowing that how do you want to proceed?

selfisekai commented 1 year ago

hm, this is gonna stop breaking my code when YouTube deploys a new JS player version, which I'm guesstimating to Monday in US hours, so I'm fine with an actual non-Unicode implementation

ridiculousfish commented 1 year ago

Ok, if you want to add a flag to get started with non-Unicode syntax I'd be happy to merge this.

selfisekai commented 1 year ago

done now, sorry it took so long

ridiculousfish commented 1 year ago

Hah, I'd forgotten we already had such a flag. Some day we'll have to deal with the real incompatibilities but for now this is fine. Thank you!

selfisekai commented 1 year ago

unrelated - can I also get a hacktoberfest-accepted label? 👉👈

ridiculousfish commented 1 year ago

Sure, did I do that right?

selfisekai commented 1 year ago

yes, thank you!