ridiculousfish / regress

REGex in Rust with EcmaScript Syntax
Apache License 2.0
176 stars 11 forks source link

Fix surrogates parsing on regex #60

Closed jedel1043 closed 1 year ago

jedel1043 commented 1 year ago

Just a simple fix to ensure unpaired surrogates are correctly handled instead of ignored. Also modified the range check for surrogates, since we just want to check if the first surrogate is high; if it is low, it is already unpaired, so we can directly return.

jedel1043 commented 1 year ago

can you please share an example regexp which did the wrong thing before?

Both new tests are examples of regular expressions that failed before the change, but some simple examples would be any succession of two unpaired surrogates:

Regex::new(r"(?:[\uD800\uDBFF])").unwrap_err();
Regex::new(r"(?:[\uDC00\uDFFF])").unwrap_err();
ridiculousfish commented 1 year ago

Makes sense, thank you! Merged.