unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.35k stars 174 forks source link

`BasicEmoji` is not a property #5588

Open robertbastian opened 5 days ago

robertbastian commented 5 days ago

Currently, "Basic Emoji" is modeled as a property, but it's not really a property because it's not a binary or enum classification for a code point. Instead, it is defined as a set, and because properties induce sets, this works, but isn't really logically sound. "Basic Emoji" is also the only "property" that induces a UnicodeSetData instead of a CodePoint[Set/Map]Data.

Instead, let's call it what it is: an Emoji set. There are in fact 7 Emoji sets defined in TR51: https://unicode.org/reports/tr51/#Emoji_Sets, which we might want to support in the future.

robertbastian commented 5 days ago

https://www.unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt

sffc commented 5 days ago

If the 7 emoji sets are the only users of this type, I'm fine calling it EmojiSetData

sffc commented 4 days ago

Names for things (discussed with @sffc @Manishearth @robertbastian)

Manishearth commented 4 days ago

Noting that UTR 23 documents "properties of strings", so it is a property, but since BinaryStringProperty is ambiguous EmojiSet seems fine.