typemytype / drawbot

http://www.drawbot.com
Other
393 stars 61 forks source link

fontContainsCharacters() returns False for characters > U+FFFF even though the font actually maps them #524

Closed josh-hadley closed 1 year ago

josh-hadley commented 1 year ago

What the title says. I suspect the root cause is in CoreText.CTFontGetGlyphsForCharacters possibly not supporting SMP or possibly the call to that function needs different treatment when SMP characters are in the string passed to fontContainsCharacters().

Steps to repro:

  1. Install a font that has known support for SMP characters such as Noto Sans Deseret which supports characters in the range U+10400-1044F
  2. Check drawBot.fontContainsCharacters() for a character in that range and compare to the font's cmap (e.g. fontTools getBestCmap()), something like this:
    
    import drawBot as db
    from fontTools.ttLib import TTFont

db.font('NotoSansDeseret-Regular', 48) fontpath = db.fontFilePath() ttfont = TTFont(fontpath) umap = ttfont['cmap'].getBestCmap() u = 0x10400 c = chr(u) dbFontContains = db.fontContainsCharacters(c) inFontCmap = u in umap print(f'{dbFontContains=}, {inFontCmap=}')



3. Observe that `dbFontContains=False` and `inFontCmap=True` (for U+10400; both are `True` for BMP characters present in the font such as U+0020 or U+00A0)

### Expected behavior:
`fontContainsCharacters()` should return `True` for any character that is mapped in the font's most comprehensive Unicode cmap subtable.
justvanrossum commented 1 year ago

Can reproduce. The relevant code is here:

https://github.com/typemytype/drawbot/blob/100dbdfed987cd392ede42aba5744977d500f9d7/drawBot/context/baseContext.py#L1796-L1805

It looks pretty innocent.

I wonder if somehow PyObjC does something wrong when converting the characters argument.

justvanrossum commented 1 year ago

After trying a few things, I think it's a PyObjC bug with CTFontGetGlyphsForCharacters. I suspect it has to encode the string as UTF-16, but doesn't.

https://developer.apple.com/documentation/coretext/1510813-ctfontgetglyphsforcharacters?language=objc, and UniChar is a 16bit type.

josh-hadley commented 1 year ago

Thanks @justvanrossum and @typemytype for the quick fix!