ridiculousfish / regress

REGex in Rust with EcmaScript Syntax
Apache License 2.0
176 stars 11 forks source link

Implement unicode flag dependent case folding #81

Closed raskad closed 9 months ago

raskad commented 9 months ago

Depends on #77. This PR only contains 2674de3980cb6bd4beb9825f5d926e94ceac2572.

Currently, in non ascii mode, we implement case insensitive matching by case folding. The spec makes a difference between unicode mode and non unicode mode. In non unicode mode the toUppercase algorithm is used instead of case folding (see https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch).

This PR adds the relevant unicode table for toUppercase and adjusts the folding and unfolding functions. I also added a test from 262 that checks this behaviour.