ridiculousfish / regress

REGex in Rust with EcmaScript Syntax
Apache License 2.0
176 stars 11 forks source link

Implement unicode script extensions with `ucd-parse` #72

Closed raskad closed 12 months ago

raskad commented 1 year ago

This PR adds unicode script extension tables and functions in addition to the script tables that we already have. This is one missing feature for the full unicode escape support.

In addition I have refactored the script table generation to use the ucd-parse crate so we do not have to parse the unicode table files ourselves. We take the output from ucd-parse and convert the codepoints into our representation. This also removed the static scripts list, which makes it easier to regenerate the script tables for new unicode versions.

If this approach looks good, I would refactor the rest of the unicode table generation to use ucd-parse aswell @ridiculousfish.