roc-lang / unicode

Universal Permissive License v1.0
9 stars 7 forks source link

Implement Visual Width #6

Closed lukewilliamboswell closed 4 months ago

lukewilliamboswell commented 6 months ago

The Unicode Character Database UCD assigns to each Unicode character as its default width property one of six values: Ambiguous, Fullwidth, Halfwidth, Narrow, Wide, or Neutral (= Not East Asian). For any given operation, these six default property values resolve into only two property values, narrow and wide, depending on context.

zulip discussion

We already have a few examples that do this in our package, so this should be easy to implement as a good first issue.

Add the EastAsianWidth.txt data file to unicode/package/data, then write a InternalEAWGen.roc file that is almost a copy paste of InternalGBPGen.roc to parse the data file and generates a Roc file that maps CodePoints CP to an East Asian Width property EAW : [Ambiguous, Fullwidth, Halfwidth, Narrow, Neutral, Wide], and then implement a corresponding helper that uses this to walk through a List U8 or a Str and sum of the width.

lukewilliamboswell commented 4 months ago

Implemented in #8