googlefonts / shaperglot

Test font files for language support
Apache License 2.0
29 stars 4 forks source link

Shaperglot is reporting auxiliary glyphs as missing base glyphs #24

Closed yanone closed 7 months ago

yanone commented 1 year ago

A few days ago you said that shaperglot would ignore auxiliary glyphs, but this check reports on them missing as a FAIL, claiming they are base not aux, the glyphs of which are defined as aux in gflanguages:

yanone@macbookair glyphsets % shaperglot check ~/Desktop/GFLatinKernel-Regular.otf en_Latn
Font does not fully support language 'en_Latn'
 * FAIL: Some base glyphs were missing: à, á, â, ã, ä, å, æ, ç, è, é, ê, ë, ì, í, î, ï, ñ, ò, ó, ô, ö, ø, ù, ú, û, ü, ÿ, ā, ă, ē, ĕ, ī, ĭ, ō, ŏ, œ, ū, ŭ
 * FAIL: Some mark glyphs were missing: ◌̀, ◌́, ◌̂, ◌̃, ◌̈, ◌̧

The attached font is a dummy font created from the basic GF Latin Kernel character set.

GFLatinKernel-Regular.otf.zip

NeilSureshPatel commented 1 year ago

There is a logic to including the aux as fails. Maybe there is a better way to handle it but the idea is that aux glyphs are needed for loan words and proper names. So yes, aux glyphs are not needed to strictly support a given language but from the end user perspective if a font doesn't support aux glyphs, then it doesn't really have everything they would need for real life applications.

simoncozens commented 1 year ago

There is, but I see Yan's point too. Perhaps we need a concept of levels of coverage. (Urgh.)

moyogo commented 1 year ago

Failing for auxiliary is too subjective and unreliable. Auxiliary is a mishmash of things that have arbitrarily been put in.

Looking at English it has auxiliary: "á à ă â å ä ã ā æ ç é è ĕ ê ë ē í ì ĭ î ï ī ñ ó ò ŏ ô ö ø ō œ ú ù ŭ û ü ū ÿ". Why are some of these included at all or why have others been excluded? What English word uses ă besides place names and people names? If those should be included, all Latin characters used in place names or people names should be included as well. Some sort of arbitrary threshold may be used but that will make no sense for some users depending on where they are. A British English speaker will have a different set of borrowed words, people or place names than an North American one or an South African one.

The whole point of having language exemplars is to be able to say whether a font supports a language, if borrowed words need to be supported then the languages they are from need to be supported as well.

NeilSureshPatel commented 1 year ago

Agreed, this is definitely an interesting problem. To add to the complexity, the marks category is a mix of exemplar and auxiliary marks. Do we need to split these up? Maybe the simplest thing is to have just one other coverage level for fails pertaining to auxiliary bases and marks.

moyogo commented 7 months ago

Note that for African languages, I have cleaned up auxiliary exemplars as much as possible to what make sense. That needs to be done for other language data as well.