The Unicode Character Database UCD assigns to each Unicode character as its default width property one of six values: Ambiguous, Fullwidth, Halfwidth, Narrow, Wide, or Neutral (= Not East Asian). For any given operation, these six default property values resolve into only two property values, narrow and wide, depending on context.
We already have a few examples that do this in our package, so this should be easy to implement as a good first issue.
Add the EastAsianWidth.txt data file to unicode/package/data, then write a InternalEAWGen.roc file that is almost a copy paste of InternalGBPGen.roc to parse the data file and generates a Roc file that maps CodePoints CP to an East Asian Width property EAW : [Ambiguous, Fullwidth, Halfwidth, Narrow, Neutral, Wide], and then implement a corresponding helper that uses this to walk through a List U8 or a Str and sum of the width.
The Unicode Character Database UCD assigns to each Unicode character as its default width property one of six values: Ambiguous, Fullwidth, Halfwidth, Narrow, Wide, or Neutral (= Not East Asian). For any given operation, these six default property values resolve into only two property values, narrow and wide, depending on context.
zulip discussion
We already have a few examples that do this in our package, so this should be easy to implement as a good first issue.
Add the EastAsianWidth.txt data file to
unicode/package/data
, then write aInternalEAWGen.roc
file that is almost a copy paste of InternalGBPGen.roc to parse the data file and generates a Roc file that maps CodePointsCP
to an East Asian Width propertyEAW : [Ambiguous, Fullwidth, Halfwidth, Narrow, Neutral, Wide]
, and then implement a corresponding helper that uses this to walk through aList U8
or aStr
and sum of the width.