googlefonts / gftools

Misc tools for working with the Google Fonts library
Apache License 2.0
242 stars 71 forks source link

Better font subset detection #982

Open m4rc1e opened 2 months ago

m4rc1e commented 2 months ago

We currently detect script subsets such as Arabic by counting the number of glyphs in the cmap table and then seeing if greater than 50% of them are in a specific script subset .nam file. Instead, what if we simply checked if a font fulfills a gflanguage base charset e.g for Arabic, we'd need all the base characters in https://github.com/googlefonts/lang/blob/main/Lib/gflanguages/data/languages/ar_Arab.textproto#L44?

simoncozens commented 2 months ago

That sounds a lot better, although we need to think it through:

I can see two approaches:

m4rc1e commented 2 months ago

Compute the supported languages first, and then declare support for all of their scripts.

I like this a lot. gflanguages also includes population count data. Perhaps instead of counting glyphs, counting how many people you're able to cover may be better.