n8willis / opentype-shaping-documents

Documentation of OpenType shaping behavior
170 stars 13 forks source link

Emoji font implementation notes #150

Closed n8willis closed 1 year ago

n8willis commented 2 years ago

This issue is here to collect implementation details on emoji fonts.

Primarily of interest (initially) is what GSUB feature(s) they use to implement emoji sequences and what image format they use. I'm also noting if they provide printable fallback glyphs for some of the invisible codepoints that interact with emoji variation and sequence stuff.

I may later add some more detail on some specific sequence behaviors, since there are some ambiguities and differences-of-opinion out there in the wild, and/or notes about other implementation details. For starters I've just got some open-source emoji fonts.

Font publisher image format sequence formation feature ZWJ sequence feature visible presentation selector visible modifier
Source Emoji Adobe cff ccmp ccmp, salt YES YES
Blobmoji C1710 CBDT ccmp ccmp no YES
Twemoji Twitter SVG liga liga no YES
Noto Color Emoji Google CBDT ccmp ccmp no YES
Noto Color Emoji Google COLRv1 ccmp ccmp no YES
EmojiTwo Android EmojiTwo CBDT ccmp ccmp no YES
EmojiTwo Apple EmojiTwo sbix morx morx no YES
EmojiTwo SVG EmojiTwo SVG ccmp ccmp no YES
Openmoji HfG Gmünd SVG liga liga no YES
FirefoxEmoji Mozilla COLRv0 rlig rlig no no
Noto Emoji Google glyf ccmp ccmp no YES
Old Noto B&W Emoji Google glyf ccmp ccmp no no
JoyPixels JoyPixels CBDT ccmp ccmp no YES
Apple Color Emoji Apple sbix morx morx no YES
Samsung Color Emoji Samsung CBDT ccmp ccmp no YES
Segoe UI Emoji Microsoft COLRv0 ccmp ccmp YES YES

** Please feel free to add info on others, particularly if you have a font from a mobile or OS vendor that you can inspect but which isn't up on the web for general downloads. Or if there is one available for general download, feel free to point me to it.

n8willis commented 2 years ago

Also feel free to request additional data columns.

wezm commented 2 years ago

JoyPixels can be downloaded from their website. I use the TTF version, which uses CBDT for the bitmaps. Not sure how to check the other features—can do them if you let me know how what to look for. allsorts dump shows the following for the list of tables:

TTF
 - version: 0x00010000
 - num_tables: 13

CBDT (checksum: 0x0fb440b5, offset: 14428, length: 13240084)
CBLC (checksum: 0x345423fa, offset: 13254512, length: 14820)
GSUB (checksum: 0x8b0e095f, offset: 13269332, length: 45102)
OS/2 (checksum: 0x760467d7, offset: 344, length: 96)
cmap (checksum: 0xd6bc3bdb, offset: 10768, length: 2865)
head (checksum: 0x15dd3201, offset: 220, length: 54)
hhea (checksum: 0x11640dac, offset: 276, length: 36)
hmtx (checksum: 0xc9000000, offset: 440, length: 10328)
maxp (checksum: 0x0ec70039, offset: 312, length: 32)
 - num_glpyhs: 3715
name (checksum: 0x3fff730d, offset: 13636, length: 758)
post (checksum: 0xfb270084, offset: 14396, length: 32)
vhea (checksum: 0x0e5e04cc, offset: 13314436, length: 36)
vmtx (checksum: 0x13880000, offset: 13314472, length: 7436)
wezm commented 2 years ago

Apple Color Emoji on macOS (Big Sur 11.6) uses sbix:

TTF
 - version: 0x00010000
 - num_tables: 19

GDEF (checksum: 0x98cf98da, offset: 8292, length: 202)
GPOS (checksum: 0x0019000c, offset: 8080, length: 16)
OS/2 (checksum: 0x60ae5c22, offset: 8196, length: 96)
cmap (checksum: 0x03262b70, offset: 8496, length: 3400)
feat (checksum: 0x03eb031d, offset: 8128, length: 32)
glyf (checksum: 0xcab5d61e, offset: 74708, length: 97090)
head (checksum: 0x56b9b4c2, offset: 736, length: 54)
hhea (checksum: 0x061402a1, offset: 664, length: 36)
hmtx (checksum: 0x8ca0136e, offset: 25636, length: 7016)
loca (checksum: 0x9c0a3a62, offset: 11896, length: 6866)
maxp (checksum: 0x0d8000aa, offset: 8096, length: 32)
 - num_glpyhs: 3432
meta (checksum: 0x8d2fc217, offset: 5036, length: 3044)
morx (checksum: 0x7ef1fcbd, offset: 171800, length: 649944)
name (checksum: 0x7487a71c, offset: 1356, length: 720)
post (checksum: 0xaf5377e3, offset: 32652, length: 42054)
sbix (checksum: 0x0daf4cff, offset: 821744, length: 216077692)
trak (checksum: 0x019d00a9, offset: 848, length: 60)
vhea (checksum: 0x01f71130, offset: 8160, length: 36)
vmtx (checksum: 0x0640005a, offset: 18764, length: 6872)
n8willis commented 2 years ago

Not sure how to check the other features—can do them if you let me know how what to look for. allsorts dump shows the following for the list of tables:

Good point! I should have added some instructions/advice on that to help volunteers.

Here are the main options I'm aware of for the basic info:

  1. FontTools / TTX

    • You can run ttx -l somefontfilename.ttf (or .otf or .ttc or .otc) to get a short list of the tables. The presence of SVG, CBDT, sbix, or COLR tells you that whichever one of those you see is the image format. If none of the above are there but glyf or CFF or CFF2 is there, then whichever of those three you find is the image format (and means it's a black-and-white emoji font, which you would probably know beforehand anyway). If there's more than one of SVG, CBDT, sbix, or COLR present in the same font file, I don't know what that would mean; it's probably red kryptonite for the vendor to do that anyway.... I guess post a comment.
    • You can run the script layout-features.py somefontfilename.ttf script from FontTools/Snippets/ and it will print you an indented list of the GSUB and GPOS features used. All that matters for the table here so far is what it gives you on the Feature: line. For a typical emoji font there's probably only one feature, but if there are several go ahead and list them.
  2. allsorts / allsorts-tools

    • As Wes showed above, you can use the dump tool from the allsorts-tools package (crate?) to run allsorts dump somefilename.ttf and get a list of tables plus other metadata; the tables are the first output. Same interpretation as above.
    • At the moment it sounds like there isn't a single-command option in allsorts to list GSUB/GPOS features. Correct me if I'm wrong.
  3. GUI font editors

    • You can also just open up the font file in a font editor and look at what it presents to you.
    • FontForge:
      • In FontForge, go to Element -> Font Info in the menu to open the font-info dialog box. It will show you the GSUB/GPOS lookups in the "Lookups" tab (left-hand side).
      • FontForge does not just show you a convenient list of all the tables. However when you open the font file, the "Warnings" dialog box will tell you if it finds SVG, CBDT, sbix, or COLR tables. Unfortunately, it will only actually open the font for editing/inspection if it finds a glyf, CFF, or CFF2 table (which a COLR font would have) or an SVG table. So you can't use it to inspect the features of the other formats.

(I'll add some instructions to this list for other editors if I figure them out or if others can contribute them)

For determining if there's a printable glyph for the selectors/modifiers:

  1. GUI font editors
    • You can open up the font in your editor and look at the slots for the Unicode codepoints for the presentation selectors (U+FE0E and U+FE0F) and the modifiers (U+1F3FB through U+1F3FF), if they exist (they might not).
  2. HarfBuzz
    • You can run the hb-view utility to output glyph contents for specific Unicode codepoints, but you might have to try a couple of options depending on the image format. Run hb-view --preserve-default-ignorables somefontfilename.ttf --unicodes=fe0e to start (for U+FE0E). You may also try adding the --font-funcs=ot and/or --shapers=ot flags to that command if it gives you trouble.
wezm commented 2 years ago

Great thanks. Here's additional info for JoyPixels and Apple Color Emoji:

mikeday commented 2 years ago

Apple Color Emoji uses the morx table for ligatures and contextual substitutions.

n8willis commented 2 years ago

Apple Color Emoji uses the morx table for ligatures and contextual substitutions.

Right; updating.

n8willis commented 2 years ago

Apple Color Emoji on macOS (Big Sur 11.6) uses sbix:


TTF
 - version: 0x00010000
 - num_tables: 19

GDEF (checksum: 0x98cf98da, offset: 8292, length: 202)
GPOS (checksum: 0x0019000c, offset: 8080, length: 16)

@wezm What is there to see in that GPOS table?

That's another thing I've been curious to document; various isolated comments say an emoji font can use GPOS to align things as desired, but it's certainly not common and some concrete examples would be interesting to examine....

kenmcd commented 2 years ago

@wezm What is there to see in that GPOS table?

  <GPOS>
    <Version value="0x00010000"/>
    <ScriptList>
      <!-- ScriptCount=0 -->
    </ScriptList>
    <FeatureList>
      <!-- FeatureCount=0 -->
    </FeatureList>
    <LookupList>
      <!-- LookupCount=0 -->
    </LookupList>
  </GPOS>
n8willis commented 2 years ago

If anyone has access to Windows systems, the various Microsoft emoji fonts are currently a big missing piece here.

(Note: pointing to the GitHub repo of the Fluent Emoji artwork does not move the needle; that's just raw art and what's interesting is the final .ttf / .otf files)

khaledhosny commented 2 years ago

I can check later, but I remember that Windows emoji fonts used GPOS (IIRC, it used dist feature and this didn't work eith HarfBuzz because dist was enabled only for certain scripts).

kenmcd commented 2 years ago

Font: Segoe UI Emoji Publisher: Microsoft Image Format: COLRv0 Sequence formation feature: ccmp ZWJ sequence feature: ccmp Visible presentation selector: Yes (U+FE0E and U+FE0F exist) Visible modifier: Yes (U+1F3FB through U+1F3FF exist)

OpenType features which affect emojis: dist, ccmp, mark, mkmk It appears kern only affects the text characters.

kenmcd commented 2 years ago

There is also a COLRv1 version of Noto Color Emoji.

Font: Noto Color Emoji (COLRv1) Publisher: Google Image Format: COLRv1 Sequence formation feature: ccmp ZWJ sequence feature: ccmp Visible presentation selector: No (U+FE0E and U+FE0F do not exist) Visible modifier: Yes (U+1F3FB through U+1F3FF exist)

n8willis commented 2 years ago

Font: Segoe UI Emoji Publisher: Microsoft Image Format: COLRv0 Sequence formation feature: ccmp ZWJ sequence feature: ccmp Visible presentation selector: Yes (U+FE0E and U+FE0F exist) Visible modifier: Yes (U+1F3FB through U+1F3FF exist)

OpenType features which affect emojis: dist, ccmp, mark, mkmk It appears kern only affects the text characters.

The rumors are true! That's excellent.... Any examples of what emoji behavior they're using these GPOS features for?

n8willis commented 2 years ago

Since this is getting long and seems to be taking useful form, I am going to move it to a "notes" file. For now, at the https://github.com/n8willis/opentype-shaping-documents/tree/n8willis-patch-2 branch. But expect that to merge in shortly.

n8willis commented 1 year ago

Since this data now lives in the repo in the notes/ subdirectory, I'm closing this issue. I'd still love to get more information from other emoji fonts, I'm sure there are also other implementation details of interest (e.g., I've heard questions about the lookup types)....

Feel free to ask additional questions in new issues, but adding more data — whether new rows of emoji records or new columns of relevant details — can be done via PR this way.