unicode-rs / unicode-segmentation

Grapheme Cluster and Word boundaries according to UAX#29 rules
https://unicode-rs.github.io/unicode-segmentation
Other
566 stars 57 forks source link

Update to Unicode 11 #43

Closed Manishearth closed 4 years ago

Manishearth commented 6 years ago

Unicode 11 comes out in June, and we should update to it.

There are a bunch of changes to the grapheme and word stuff involved here. Most of the complex emoji rules have been replaced with things using \p{Extended_Pictographic}, which is not a disjoint grapheme category, rather it is an additional property. This crate may require some refactoring.

We may want to update to Unicode 10 before that; that should be a straightforward regen of the tables IIRC.

rth commented 5 years ago

We may want to update to Unicode 10 before that; that should be a straightforward regen of the tables IIRC.

PR proposed in #56

Manishearth commented 5 years ago

Some more background on the emoji simplifications in 11: https://unicode.org/mail-arch/unicode-ml/y2018-m01/0000.html, http://www.unicode.org/review/pri355/

wezm commented 4 years ago

Can this be closed now that #72 is merged?