Open chop-suey opened 1 week ago
It's been a while since I've worked on this, but the emoji
and text
fields are the source of truth ones, while hexcode
is either unqualified, qualified, or the default variant I think. It's the value parsed from the left column of these data files: https://github.com/milesj/emojibase/blob/master/packages/generator/src/parsers/parseData.ts#L38
But still the emoji
and text
does not always contain the correct sequence, see my example for "umbrella with rain drops".
I just realized, there are also other representations of emoji in https://github.com/milesj/emojibase/blob/master/packages/data/meta/hexcodes.json. Is the hexcode
in data.raw.json
supposed to be used as key to get the matching mapping in hexcodes.json
?
The entry for "umbrella with raind drops" in hexcodes.json
looks like this:
"2614": {
"2614": 0,
"2614-FE0F": 0,
"2614-FE0E": 0
}
According to this, all the entries are fully qualified, but in https://www.unicode.org/Public/emoji/15.1/emoji-test.txt it looks like only 2614
should be treated as fully qualified.
There seem to be some inconsistencies in the generated metadat (e.g.
packages/data/en/data.raw.json
).In some cases the
hexcode
is missing the variant selector 16fe0f
according to the unicode data .Examples:
1F574-FE0F
according to unicodehexcode
is1F574
emoji
contains the sequence1F574-FE0F
2614
according to unicodehexcode
is2614
emoji
contains the sequence2614-FE0F
Like this i never now which property could be the source of truth. Am i missing something or is this an error in the data?