googlefonts / ufo2ft

A bridge from UFOs to FontTools objects (and therefore, OTFs and TTFs).
MIT License
151 stars 43 forks source link

Split kerning by script, not by direction #636

Closed simoncozens closed 2 years ago

simoncozens commented 2 years ago

Currently we split kerning into lookups based on a single factor, horizontal direction. However, shaping engines will perform script segmentation and so there will never be any cross-script kerning. By splitting the kerning into lookups based on the script of the glyphs involved, we can produce smaller lookups for large multi-script fonts, hopefully causing less overflows (faster compilation) and reducing file space by giving the binary compiler a better starting point for splitting lookups into subtables.

There are a few failsafes, such as glyphs without identifiable scripts (as well as purely common-script glyphs) go into a "Common" pot which is added to DFLT/dflt.

This may be easiest to review commit by commit; the changes are fairly small and self-contained apart from f56eaf6 which is the big rewrite.

behdad commented 2 years ago

There are a few failsafes, such as glyphs without identifiable scripts (as well as purely common-script glyphs) go into a "Common" pot which is added to DFLT/dflt.

They are added to all other lookups as well, right?

The main deficiency I see in this PR is lack of support for Script_Extensions data-file. That's not a huge deal though. If glyphs with script A and B are kerned and you insert that kern in both lookups for A and for B, that handles most of the cases already.

simoncozens commented 2 years ago

They are added to all other lookups as well, right?

No, but the common lookup is added to all script/language combinations

The main deficiency I see in this PR is lack of support for Script_Extensions data-file.

I think it does support this. knownScriptsPerCodepoint looks up all scripts for a codepoint using script_extensions. We then partition the kern pair by scripts, and then evaluate all script combinations. So for example the Arabic comma is included in both Arabic and NKo. A kerning file like

    <key>comma-ar</key>
    <dict>
        <key>gba-nko</key>
        <integer>-120</integer>
        <key>lam-ar</key>
        <integer>-30</integer>
    </dict>
    <key>gba-nko</key>
    <dict>
        <key>gba-nko</key>
        <integer>-20</integer>
    </dict>
    <key>lam-ar</key>
    <dict>
        <key>lam-ar</key>
        <integer>50</integer>
    </dict>
    <key>three</key>
    <dict>
        <key>three</key>
        <integer>-50</integer>
    </dict>

becomes

lookup kern_Arab {
    lookupflag IgnoreMarks;
    pos comma-ar lam-ar <-30 0 -30 0>;
    pos lam-ar lam-ar <50 0 50 0>;
} kern_Arab;

lookup kern_Nkoo {
    lookupflag IgnoreMarks;
    pos comma-ar gba-nko <-120 0 -120 0>;
    pos gba-nko gba-nko <-20 0 -20 0>;
} kern_Nkoo;

lookup kern_Common {
    lookupflag IgnoreMarks;
    pos three three -50;
} kern_Common;

(check out test_split_pair in the tests.)

behdad commented 2 years ago

Thanks for the explanation. This is neat. LGTM!