Closed RunDevelopment closed 6 months ago
Latest commit: f2ff74524a269532d7f83d9f9b66a8ace9de4edc
The changes in this PR will be included in the next version bump.
Not sure what this means? Click here to learn what changesets are.
Click here if you're a maintainer who wants to add another changeset to this PR
Fixes #720
This PR adds a new rule that allows users to enforce the naming of Unicode properties. It has 3 main features:
gc=
/General_Category=
keys, e.g.\p{gc=L}
->\p{L}
. These prefixes are unnecessary, because the values of theGeneral_Category
property can be accessed without the key.General_Category
/gc
,Script
/sc
, andScript_Extensions
/scx
.\p{L}
->\p{Letter}
and\p{Hex}
->\p{Hex_Digit}
.All of these feature can be individually configured and turned off by the user. The
regexp/unicode-property
is not included in ourrecommended
config, because this rule only enforces a specific style.Default configuration
The default configuration is the following:
This means that, by default, the rule will (1) remove
General_Category
/gc
keys (e.g.\p{gc=L}
->\p{L}
) and (2) enforce long names for values of theScript
andScript_Extensions
properties (e.g.\p{sc=Kana}
->\p{sc=Katakana}
).I chose a minimal configuration because I didn't want to make the rule generate a lot of error for people trying to adapt the rule. I think the 2 effects work well in any code base, no matter what style they usually prefer. (1) simply removes an unnecessary prefix to "simplify" the regex, and (2) prevents the use of the (IMO) horrible aliases for scripts.
Unicode data
Since I needed the data for the mapping between aliases to implement this rule, I had to make the choice between taking a dependency (e.g.
@unicode/unicode-15.0.0
) or including the relevant data in the source files of this project.I chose against adding a dependency, because it was easy enough to get the data I needed and because most of
@unicode/unicode-15.0.0
would be dead weight to us.However, the data I included is used through an API (the
AliasMap
class), so we can easily switch to using a dependency without needing to change theregexp/unicode-property
rule.