missing emoji substitutions

forresto commented 5 months ago

Expected Behavior

There are a handful of emoji substitutions that are not found, even after #688 landed.

❤️‍🩹 should render as one glyph. (1433)

Current Behavior

❤️‍🩹 is rendering as 3 glyphs, ([ 169, 18, 1345 ])

Possible Solution

I can make a PR with failing test cases, if that's helpful.

Steps to Reproduce (for bugs)

#️⃣ found sub [ 4, 22 ] 1520
*️⃣ found sub [ 5, 22 ] 1521
0️⃣ found sub [ 6, 22 ] 1531
1️⃣ found sub [ 7, 22 ] 1522
⛹️‍♀️ found sub [ 140, 18, 81 ] 140
⛹️‍♂️ found sub [ 140, 18, 82 ] 140
❤️‍🔥 found sub [ 169, 18, 794 ] 1432
❤️‍🩹 found sub [ 169, 18, 1345 ] 1433

I'm manually looking for substitutions to find these, like this...

  const substitutions = font.substitution.getFeature("ccmp");

  let opentypeOptions = {
    kerning: true,
    language: "dflt",
    features: [{ script: "DFLT", tags: ["ccmp", "liga"] }],
  };

  for (const emoji of emojiData) {
    const { unicode } = emoji;
    const glyphs = font.stringToGlyphs(unicode, opentypeOptions);
    let glyph;
    if (glyphs.length === 1) {
      glyph = glyphs[0];
    } else {
      const indexes = glyphs.map((glyph) => glyph.index);
      const sub = substitutions.find((substitution) => equals(substitution.sub, indexes));

      if (sub) {
        glyph = font.glyphs.get(sub.by);
        console.log(unicode, "found sub", indexes, sub.by);
      } else {
        console.log(unicode, "no ccmp sub", indexes);
      }
    }
  }

/** Custom equals function that can also check lists. */
function equals(a, b) {
  if (a === b) {
    return true;
  } else if (Array.isArray(a) && Array.isArray(b)) {
    if (a.length !== b.length) {
      return false;
    }
    for (let i = 0; i < a.length; i += 1) {
      if (!equals(a[i], b[i])) {
        return false;
      }
    }
    return true;
  } else {
    return false;
  }
}

Context

Using noto-emoji in our CAD app, https://cuttle.xyz

Your Environment

Version used: be0d4417a04d92d43178e075273048e926164abf
Font used: noto-emoji v47
Browser Name and version: Node
Operating System and version (desktop or mobile):
Link to your project:

Connum commented 5 months ago

@TonyJR would you be available to have a look at this, as you implemented the ccmp feature?

TonyJR commented 5 months ago

@TonyJR would you be available to have a look at this, as you implemented the ccmp feature?

Yes, I'm trying this. I found the rule for "#️⃣ found sub [ 4, 22 ] 1520"

sub numbersign uni20E3 by keycap_hash;

It's should be GSUB4.1. I will find the reason.

forresto commented 5 months ago

Here are the ones that should result in one glyph, but return multiple.

``` [ {"string":"#️⃣","indexes":[4,23],"expected":1548}, {"string":"*️⃣","indexes":[5,23],"expected":1549}, {"string":"0️⃣","indexes":[6,23],"expected":1559}, {"string":"1️⃣","indexes":[7,23],"expected":1550}, {"string":"2️⃣","indexes":[8,23],"expected":1551}, {"string":"3️⃣","indexes":[9,23],"expected":1552}, {"string":"4️⃣","indexes":[10,23],"expected":1553}, {"string":"5️⃣","indexes":[11,23],"expected":1554}, {"string":"6️⃣","indexes":[12,23],"expected":1555}, {"string":"7️⃣","indexes":[13,23],"expected":1556}, {"string":"8️⃣","indexes":[14,23],"expected":1557}, {"string":"9️⃣","indexes":[15,23],"expected":1558}, {"string":"🏋️‍♀️","indexes":[447,18,82],"expected":447}, {"string":"🏋️‍♂️","indexes":[447,18,83],"expected":447}, {"string":"🏌️‍♀️","indexes":[448,18,82],"expected":448}, {"string":"🏌️‍♂️","indexes":[448,18,83],"expected":448}, {"string":"🏳️‍🌈","indexes":[485,18,256],"expected":1871}, {"string":"🏳️‍⚧️","indexes":[485,18,116],"expected":1872}, {"string":"👁️‍🗨️","indexes":[566,18,886],"expected":1432}, {"string":"👨‍❤️‍👨","indexes":[605,18,170,18,605],"expected":646}, {"string":"👨‍❤️‍💋‍👨","indexes":[605,18,170,18,640,18,605],"expected":644}, {"string":"👩‍❤️‍👨","indexes":[606,18,170,18,605],"expected":646}, {"string":"👩‍❤️‍👩","indexes":[606,18,170,18,606],"expected":646}, {"string":"👩‍❤️‍💋‍👨","indexes":[606,18,170,18,640,18,605],"expected":644}, {"string":"👩‍❤️‍💋‍👩","indexes":[606,18,170,18,640,18,606],"expected":644}, {"string":"🕵️‍♀️","indexes":[855,18,82],"expected":855}, {"string":"🕵️‍♂️","indexes":[855,18,83],"expected":855}, {"string":"⛹️‍♀️","indexes":[141,18,82],"expected":141}, {"string":"⛹️‍♂️","indexes":[141,18,83],"expected":141}, {"string":"❤️‍🔥","indexes":[170,18,795],"expected":1433}, {"string":"❤️‍🩹","indexes":[170,18,1346],"expected":1434}, ] ```

TonyJR commented 5 months ago

I found the reason! You have inputed a “fully-qualified” emoji and the font not supported.

TonyJR commented 5 months ago

WTF！Figma draw it right. I'm going to find the reason out.

Connum commented 5 months ago

@TonyJR any progress on this?

TonyJR commented 5 months ago

@TonyJR any progress on this?

Sorry, I've been a bit busy lately. \uFE00-\uFE0F are variation selectors, which should deal in cmap. I have tested halfbuzz and it skips these characters. I have two solutions to solve the bug.

Process cmap before processing gsub. After this, remove them away.
Skip them when processing gsub.

I prefer the first option. @Connum, are you familiar with CMAP.

Connum commented 5 months ago

I implemented a special handling of variation selectors some time ago, maybe that's interfering? And the order of processing should be stated in the docs. As far as I remember, cmap should be handled before any layout is applied.

TonyJR commented 5 months ago

Yes, you are right. I'm trying to find the order. But I prefer to directly refer to the Halfbuzz source code. And I found that half actually merges and processes the functions in gsub/gpos. Perhaps we should also refer to it, but this may be a big project...

opentypejs / opentype.js