When you choose a font to typeset some text, the very first question that interests you is: which fonts support the language(s) of my text? A font that doesn’t support the languages won’t be of any interest.
But what does it mean, exactly, that a font supports a given language? For Latin-script fonts, the task is reasonably easy and mostly equals to: does the font have glyphs for all the Unicode codepoints used by the language? In reality, this isn’t always so trivial either. To typeset text that is written in English, it’s not enough that the font has glyphs for the A-Z and a-z letters. It also needs digits, and some punctuation. Well, it also probably needs some accented letters, because you may want to write the names Chloë or Brontë, for example.
But it’s still a relatively easy task to check. The [Unicode CLDR]() project collects “exemplar characters” several categories. If you check if the font contains glyphs for all these characters, you can say, “OK, this font supported this language”. The Rosetta Type Hyperglot project contains similar information, with some annotations.
Rationale behind Shaperglot
But this approach does not work for scripts that need “shaping”, a process that maps the input Unicode codepoints of the text into a series of glyphs in a way which is not a 1:1 correspondence. For scripts like Arabic or Devenagari, it’s not enough to check if the font has default glyphs for all Unicode codepoints from some set. You also need to check if the font has some rules (features) that perform the shaping so that the final rendered text is orthographically correct.
Shaperglot allows to check for the Unicode coverage, but also allows other tests. In particular, the idea is that:
you feed in a specially prepared text (string of characters), and the font
you get the default series of glyphs
you run HarfBuzz and observe if something changed (the final series of glyphs is different from the default series of glyphs), and what specifically has changed
The fact that a change happened indicates that there is some support for a language beyond just the Unicode codepoint coverage.
For example, if I put the default i and apply the locl feature with the script tag latn and the language tag TRK, and I see that the output glyph (or series) is different than the input, I can say with higher certainty “this font supports Turkish”.
Shaperglot will not (yet ;) ) use computer vision to judge the quality of the change, but it’s based on a very reasonable assumption that if I put in some letter and ask HarfBuzz to apply a certain feature, and the result as the same as the input, then it means that the feature is not meaningfully implemented, hence there is a problem.
The advantage of using Shaperglot approach is that the tests can be complex. Sometimes, the meaningful change will come about only in a combination of certain features, not just one feature. Or maybe an alternative (some fonts may implement something via liga, some others may implement the same via ccmp or calt). So the test may ask for all 3 features to be applied and check if something changed.
In future, additional, more sophisticated tests, can be implemented. Test-driven development can help to have better fonts, but also can help to get better info about language support.
Problem
When you choose a font to typeset some text, the very first question that interests you is: which fonts support the language(s) of my text? A font that doesn’t support the languages won’t be of any interest.
But what does it mean, exactly, that a font supports a given language? For Latin-script fonts, the task is reasonably easy and mostly equals to: does the font have glyphs for all the Unicode codepoints used by the language? In reality, this isn’t always so trivial either. To typeset text that is written in English, it’s not enough that the font has glyphs for the A-Z and a-z letters. It also needs digits, and some punctuation. Well, it also probably needs some accented letters, because you may want to write the names Chloë or Brontë, for example.
But it’s still a relatively easy task to check. The [Unicode CLDR]() project collects “exemplar characters” several categories. If you check if the font contains glyphs for all these characters, you can say, “OK, this font supported this language”. The Rosetta Type Hyperglot project contains similar information, with some annotations.
Rationale behind Shaperglot
But this approach does not work for scripts that need “shaping”, a process that maps the input Unicode codepoints of the text into a series of glyphs in a way which is not a 1:1 correspondence. For scripts like Arabic or Devenagari, it’s not enough to check if the font has default glyphs for all Unicode codepoints from some set. You also need to check if the font has some rules (features) that perform the shaping so that the final rendered text is orthographically correct.
Shaperglot allows to check for the Unicode coverage, but also allows other tests. In particular, the idea is that:
The fact that a change happened indicates that there is some support for a language beyond just the Unicode codepoint coverage.
For example, if I put the default
i
and apply thelocl
feature with the script taglatn
and the language tagTRK
, and I see that the output glyph (or series) is different than the input, I can say with higher certainty “this font supports Turkish”.Shaperglot will not (yet ;) ) use computer vision to judge the quality of the change, but it’s based on a very reasonable assumption that if I put in some letter and ask HarfBuzz to apply a certain feature, and the result as the same as the input, then it means that the feature is not meaningfully implemented, hence there is a problem.
The advantage of using Shaperglot approach is that the tests can be complex. Sometimes, the meaningful change will come about only in a combination of certain features, not just one feature. Or maybe an alternative (some fonts may implement something via
liga
, some others may implement the same viaccmp
orcalt
). So the test may ask for all 3 features to be applied and check if something changed.Shaperglot has example implementations of tests for some languages, but needs more data.
In future, additional, more sophisticated tests, can be implemented. Test-driven development can help to have better fonts, but also can help to get better info about language support.