Closed StephenTangCook closed 2 months ago
In one of my projects we have a script that just creates a static mapping of all possible emoji variation shortnames from emoji-datasource
(which gets its data from emoji-data
, the same emoji library Slack uses). You're free to reuse the logic if it helps!
Essentially we have the scripts:
"scripts": {
"build": "tsup src/index.ts",
"copy-emoji-data": "cp node_modules/emoji-datasource/emoji.json dist/emoji.json",
"create-emoji-mapping": "pnpm copy-emoji-data && tsup src/create-emoji-mapping.ts && ts-node dist/create-emoji-mapping.cjs"
},
And the create-emoji-mapping.ts
:
import * as emojiConvertor from 'emoji-js';
import * as fs from 'fs-extra';
import * as path from 'path';
type EmojiEntry = {
name: string;
unified: string;
non_qualified: string | null;
docomo: string | null;
au: string | null;
softbank: string | null;
google: string | null;
image: string;
sheet_x: number;
sheet_y: number;
short_name: string;
short_names: string[];
text: string | null;
texts: string[] | null;
category: string;
subcategory: string;
sort_order: number;
added_in: string;
has_img_apple: boolean;
has_img_google: boolean;
has_img_twitter: boolean;
has_img_facebook: boolean;
skin_variations?: Record<string, SkinVariation>;
};
type SkinVariation = {
unified: string;
non_qualified: string | null;
image: string;
sheet_x: number;
sheet_y: number;
added_in: string;
has_img_apple: boolean;
has_img_google: boolean;
has_img_twitter: boolean;
has_img_facebook: boolean;
};
type EmojiData = EmojiEntry[];
type EmojiShortnameMapping = Record<string, string>;
console.log('Creating emoji short_name mapping...');
// Get the emoji source data
const emojiJsonPath = path.resolve('dist', 'emoji.json');
console.log(`Loading emoji data file at '${emojiJsonPath}'...`);
const emojiData: EmojiData = fs.readJsonSync(emojiJsonPath);
// Get the emoji converter
const emojiConverter = new emojiConvertor.EmojiConvertor();
emojiConverter.replace_mode = 'unified';
/**
* Converts string represntation of unified characters to unified characters (e.g. "1F3FB-1F3FC" => \u{1F3FB}\u{1F3FC}
* @param unified - the unified string representation (e.g. "1F3FB-1F3FC")
* @returns the unified character (e.g. \u{1F3FB}\u{1F3FC}) as a string
*/
function unifiedStringToCodePoint(unified: string): string {
return unified
.split('-')
.map((unifiedSegment) => String.fromCodePoint(parseInt(unifiedSegment, 16)))
.join('');
}
/**
* Converts a code point to a pretty stringified version (e.g. "\u{1F3FB}\u{1F3FC}")
* @param codePoint - the code point to convert
* @returns the pretty stringified version
*/
function codePointToPrettyString(codePoint: string): string {
return codePoint
.split('')
.map((segment) => {
const codePoint = segment.codePointAt(0);
if (codePoint !== undefined) {
const str = codePoint.toString(16).toUpperCase();
return `\\u{${str}}`;
}
return segment;
})
.join('');
}
// Get the skintone variations (e.g. "1F3FE" => "skin-tone-5" )
const skintoneVariationsMapping: Record<string, string> = {};
emojiData
.filter(
(emojiEntry: EmojiEntry) =>
emojiEntry.category && emojiEntry.subcategory === 'skin-tone'
)
.map((skintoneEntry: EmojiEntry) => {
skintoneVariationsMapping[skintoneEntry.unified] = skintoneEntry.short_name;
});
// Iterate through the emoji data and create a mapping
const emojiShortnameMapping: EmojiShortnameMapping = {};
emojiData.forEach((emojiEntry: EmojiEntry) => {
emojiEntry.short_names.forEach((shortname: string) => {
// start with default shortname (no modifier) and add variations as needed
const emojiShortnameToUnifiedMap: Record<string, string> = {
[shortname]: unifiedStringToCodePoint(emojiEntry.unified)
};
// add any skintone variations
if (emojiEntry.skin_variations) {
const skinVariations: string[] = Object.keys(emojiEntry.skin_variations);
skinVariations.map((skinVariation: string) => {
// NOTE: The skin-tone variation can include multiple hyphen-separated values
// which we need to convert to a single shortname variation
// e.g. "1F3FB-1F3FC" => "::skin-tone-2-3"
const skinVariationShortname = skinVariation
.split('-')
.map((skinVariationUnifiedSegment, index) => {
const skinVariationShortname =
skintoneVariationsMapping[skinVariationUnifiedSegment]; // "1F3FB" => "skin-tone-2"
if (index > 0) {
// for multiple segments we'll remove the prefix, e.g. "skin-tone-2" => "-2"
return skinVariationShortname.replace('skin-tone', '');
} else {
return skinVariationShortname;
}
})
.join('');
const shortnameVariant = `${shortname}::${skinVariationShortname}`;
const variationInfo = emojiEntry.skin_variations?.[skinVariation];
if (variationInfo?.unified) {
// console.log('variationInfo.unified', variationInfo.unified);
emojiShortnameToUnifiedMap[shortnameVariant] =
unifiedStringToCodePoint(variationInfo.unified);
}
});
}
// look up the emoji for each shortname variation
Object.entries(emojiShortnameToUnifiedMap).map(
([shortname, emojiUnified]) => {
try {
const emoji = emojiConverter.replace_unified(emojiUnified);
emojiShortnameMapping[shortname] = emoji;
} catch (e) {
// TODO (known bug): this fails for certain skintone variations
// see https://github.com/iamcal/js-emoji/issues/191
console.error(
`Error converting emoji with shortname '${shortname}' and unified '${codePointToPrettyString(emojiUnified)}': ${e}`
);
}
return;
}
);
});
});
// Save to output file
const outputFile = 'emoji-shortcode-mapping.json';
const outputFilePretty = 'emoji-shortcode-mapping_pretty.json';
fs.writeJsonSync('src/' + outputFile, emojiShortnameMapping);
fs.writeJsonSync('src/' + outputFilePretty, emojiShortnameMapping, {
spaces: 2
});
console.log(
`Emoji shortcode mapping (${
Object.keys(emojiShortnameMapping).length
} entries) saved to ${outputFile}`
);
Note I just updated the script content with some skin tone variation bug fixes, in particular when there are multiple variations for compound emojis (e.g. two_women_holding_hands::skin-tone-2-5
: "👩🏻🤝👩🏾").
Here's the output file if you just want to use it :) emoji-shortcode-mapping_pretty.json emoji-shortcode-mapping.json
@StephenTangCook Thank you so much for the file, this is super helpful.
@StephenTangCook resolved in v0.3.8
Awesome! I swear I'll open source the emoji list conversation in an npm one day when I find the time! 😭
Emojis can have modifier sequences, most commonly used for skin-tone variables. There can be up to six color variations (with the first one assumed to be the default). Here's an example for thumbs-up:
Slack supports changing your default skin tone for emojis, so a full emoji with modifier sequence could appear in a message.