googlefonts / glyphsets

Apache License 2.0
79 stars 18 forks source link

GF Cyrillic Plus / Pro update #237

Open alexeiva opened 1 month ago

alexeiva commented 1 month ago

I suggest adding local variants to GF Cyrillic Plus

list for Italic styles:

De-cy.loclBGR
Ef-cy.loclBGR
El-cy.loclBGR
Esdescender-cy.loclBSH
Esdescender-cy.loclCHU
Gestroke-cy.loclBSH
Ii-cy.loclBGR
Iigrave-cy.loclBGR
Iishort-cy.loclBGR
Zedescender-cy.loclBSH
be-cy.loclSRB
de-cy.loclBGR
de-cy.loclSRB
el-cy.loclBGR
esdescender-cy.loclBSH
esdescender-cy.loclCHU
ge-cy.loclBGR
ge-cy.loclSRB
gestroke-cy.loclBSH
gje-cy.loclMKD
ii-cy.loclBGR
ii-cy.loclSRB
iigrave-cy.loclBGR
iishort-cy.loclBGR
ka-cy.loclBGR
pe-cy.loclBGR
pe-cy.loclSRB
sha-cy.loclSRB
te-cy.loclBGR
te-cy.loclSRB
ve-cy.loclBGR
yu-cy.loclBGR
ze-cy.loclBGR
zedescender-cy.loclBSH
zhe-cy.loclBGR

list for Roman styles:

De-cy.loclBGR
Ef-cy.loclBGR
El-cy.loclBGR
Esdescender-cy.loclBSH
Esdescender-cy.loclCHU
Gestroke-cy.loclBSH
Ii-cy.loclBGR
Iigrave-cy.loclBGR
Iishort-cy.loclBGR
Zedescender-cy.loclBSH
be-cy.loclSRB
el-cy.loclBGR
esdescender-cy.loclBSH
esdescender-cy.loclCHU
gestroke-cy.loclBSH
ii-cy.loclBGR
iigrave-cy.loclBGR
iishort-cy.loclBGR
ka-cy.loclBGR
pe-cy.loclBGR
te-cy.loclBGR
ve-cy.loclBGR
yu-cy.loclBGR
ze-cy.loclBGR
zedescender-cy.loclBSH
zhe-cy.loclBGR

It's possible to use just one list for the italic style on roman styles too. These glyphs will be copies of default shapes.

be-cy.loclSRB
de-cy.loclSRB
ge-cy.loclSRB
ii-cy.loclSRB
pe-cy.loclSRB
sha-cy.loclSRB
te-cy.loclSRB
gje-cy.loclMKD
alexeiva commented 1 month ago

These glyphs are already present in GF Cyrillic Plus, and should be removed from GF Cyrillic Pro

Ghemiddlehook-cy
ghemiddlehook-cy

Also the spelling has changed in Glyphs 3: Ghemiddlehook-cy -> Gemiddlehook-cy ghemiddlehook-cy -> gemiddlehook-cy

alexeiva commented 1 month ago

https://github.com/googlefonts/glyphsets/blob/main/GLYPHSETS.md#gf-cyrillic-pro

For Headline typefaces (?), with language support more Non-Slavic languages. Additional characters in this set provide support for the following 18 languages: Abkhaz, Chukchi, Enets, Eskimo, Even, Evenki, Itelmen, Khanty, Kildin Sami, Koryak, Mansi, Nganasan, Nenets, Oroch, Orok, Sakha/Yakut, Tati, Yukaghir, Yupik Ulch

This text isn't accurate. GF Cyrillic Pro isn't for Headline typefaces. Sakha/Yakut is already supported in GF Cyrillic Plus, and should be removed from GF Cyrillic Pro

vv-monsalve commented 1 month ago

Hi @alexeiva,

These glyphs are already present in GF Cyrillic Plus, and should be removed from GF Cyrillic Pro

Thank you for the suggested list. However, the Cyrllic Pro still needs to be redefined (please ignore it for the time being). Currently, only the Core and Plus have been revised and redefined, so please ensure the local variants listed above correspond to the latter.

alexeiva commented 1 month ago

The list I mentioned corresponds to GF Cyrillic Plus

vv-monsalve commented 1 month ago

The list I mentioned corresponds to GF Cyrillic Plus

Fantastic, ty!

yanone commented 1 month ago

A few remarks:

Currently, only the Core and Plus have been revised and redefined, so please ensure the local variants listed above correspond to the latter.

Since we're trying to automate as much as possible with the new glyphsets technical approach, I propose that we don't manually curate the list of local variants per glyphset, but rather have them all in one file and have the program choose them automatically based on whether or not their respective base glyphs are present in a glyphset or not. The Roman/Italic is difficult to solve elegantly. Give me some time to think about the best solution for it.

These glyphs are already present in GF Cyrillic Plus, and should be removed from GF Cyrillic Pro

Since we started reworking glyphsets last year, we don't manually curate glyphsets anymore but only curate languages to include. As Viviana said, Cyrillic Pro has yet to be renewed.

This brings me to another topic that might be of interest here: So far we have no programmatically defined inheritance, such as Cyrillic Core being included in Plus, and (later) Plus being included in Pro. As a result, after the recent redefinition of Core and Plus, Plus is actually missing a few glyphs that are present in Core (because the same is true for the languages defined for each). If you install the current code base and run glyphsets compare GF_Cyrillic_Core GF_Cyrillic_Plus, you get this (see GF_Cyrillic_Plus is missing 16 glyphs compared to GF_Cyrillic_Core at the bottom):


 GF_Cyrillic_Core 

Total glyphs: 163

Letter (114 glyphs): 
`Ё Ђ Є І Ї Ј Љ Њ Ћ Ў Џ А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я ё ђ є і ї ј љ њ ћ ў џ Ґ ґ Ғ ғ Җ җ Қ қ Ң ң Ү ү Ұ ұ Ҳ ҳ Ҷ ҷ Һ һ Ә ә Ӣ ӣ Ө ө Ӯ ӯ`

Mark, nonspacing (5 glyphs): 
`◌̀ ◌́ ◌̄ ◌̆ ◌̈`

Mark, spacing (1 glyphs): 
`ʼ`

Number (10 glyphs): 
`0 1 2 3 4 5 6 7 8 9`

Punctuation (28 glyphs): 
`! " # ' ( ) * , - . / : ; ? [ \ ] « » – — ‘ ’ ‚ “ ” „ …`

Symbol (5 glyphs): 
`% & + @ №`

 GF_Cyrillic_Plus 

Total glyphs: 190

GF_Cyrillic_Plus has 43 additional glyphs compared to GF_Cyrillic_Core:

Letter (42 glyphs): 
`Ѓ Ѕ Ќ ѓ ѕ ќ Ҕ ҕ Ҙ ҙ Ҝ ҝ Ҡ ҡ Ҥ ҥ Ҫ ҫ Ҹ ҹ Ӏ ӏ Ӑ ӑ Ӕ ӕ Ӗ ӗ Ӝ ӝ Ӟ ӟ Ӥ ӥ Ӧ ӧ Ӱ ӱ Ӳ ӳ Ӵ ӵ`

Mark, nonspacing (1 glyphs): 
`◌̋`

GF_Cyrillic_Plus is missing 16 glyphs compared to GF_Cyrillic_Core:

Letter (12 glyphs): 
`Ђ Ћ ђ ћ Ұ ұ Ҷ ҷ Ӣ ӣ Ӯ ӯ`

Mark, nonspacing (1 glyphs): 
`◌̄`

Mark, spacing (1 glyphs): 
`ʼ`

Punctuation (1 glyphs): 
`\`

Symbol (1 glyphs): 
`№`

So, if we want there to be inheritance, I can implement that. So far that's manual process as per the list of languages in Plus having to include the ones from Core. But Plus could be defined as "everything in Core, along with these additional languages: 1, 2, 3".

The same is true for Latin and all other scripts. Currently we’re not strictly defining inheritance but we could. (We could also exclude glyphsets from one another like we can currently exclude languages, see Latin African).

Also the spelling has changed in Glyphs 3

The module uses the GlyphData.xml that comes with glyphsLib. Ultimately, the correct encoding is relevant, not their glyph names. If necessary, people can update the names in Glyphs.app. The perks of working in font engineering... In other words, I'm not going to address this one. The responsibility lies with glyphsLib.

yanone commented 1 month ago

What's the urgency level of this issue, as in the context?

vv-monsalve commented 1 month ago

Since we're trying to automate as much as possible with the new glyphsets technical approach, I propose that we don't manually curate the list of local variants per glyphset, but rather have them all in one file and have the program choose them automatically based on whether or not their respective base glyphs are present in a glyphset or not.

I was thinking of using this list to feed a data/definitions/per_glyphset/GF_Cyrillic_Plus.stub.glyphs file and re-run the build.sh. Would anything else or different be needed?

vv-monsalve commented 1 month ago

So, if we want there to be inheritance, I can implement that. So far that's manual process as per the list of languages in Plus having to include the ones from Core. But Plus could be defined as "everything in Core, along with these additional languages: 1, 2, 3".

This is indeed a matter of interest. I have some related concerns/questions from the subsetting process I've been following for PPS. Would you be able to have a meeting to discuss this?

What's the urgency level of this issue, as in the context?

Ideally, the last definitions should be made before onboarding PPS. But beyond that specific project, I'm interested in improving the system overall.

vv-monsalve commented 1 month ago

Hi, @alexeiva, Could you please inspect the Cyrillic Core and let us know if some local variants are also needed for it? :)

alexeiva commented 4 weeks ago

For GF Cyrillic Core:

Italic style:

De-cy.loclBGR
Ef-cy.loclBGR
El-cy.loclBGR
Ii-cy.loclBGR
Iigrave-cy.loclBGR
Iishort-cy.loclBGR
be-cy.loclSRB
de-cy.loclBGR
de-cy.loclSRB
el-cy.loclBGR
ge-cy.loclBGR
ge-cy.loclSRB
gje-cy.loclMKD
ii-cy.loclBGR
ii-cy.loclSRB
iigrave-cy.loclBGR
iishort-cy.loclBGR
ka-cy.loclBGR
pe-cy.loclBGR
pe-cy.loclSRB
sha-cy.loclSRB
te-cy.loclBGR
te-cy.loclSRB
ve-cy.loclBGR
yu-cy.loclBGR
ze-cy.loclBGR
zhe-cy.loclBGR

Roman style:

De-cy.loclBGR
Ef-cy.loclBGR
El-cy.loclBGR
Ii-cy.loclBGR
Iigrave-cy.loclBGR
Iishort-cy.loclBGR
be-cy.loclSRB
el-cy.loclBGR
ii-cy.loclBGR
iigrave-cy.loclBGR
iishort-cy.loclBGR
ka-cy.loclBGR
pe-cy.loclBGR
te-cy.loclBGR
ve-cy.loclBGR
yu-cy.loclBGR
ze-cy.loclBGR
zhe-cy.loclBGR
yanone commented 3 weeks ago

Hi @vv-monsalve and @alexeiva,

I've added dynamically inserted glyphsets for Cyrillic that contain the localized variants relevant for each glyphset.

They are defined in complete here and are dynamically purged depending on whichever encoded base characters end up in each glyphset. So if new localizations need to be added, add them to the definitions files and the rest will be handled automatically.

We need to pay attention to glyph name changes tho, because these localizations are now defined by their GlyphData.xml glyph names. If any changes are made to the definitions or to glyphsLib (which contains GlyphData.xml), we need to pay extra attention that no localizations go missing because their base character glyph name changed. Thankfully, automatic dependency update PRs will include a complete re-render of the glyphsets, so missing Cyrillic localizations should be detectable when a new glyphsLib version comes down, but glyphsets maintainers (currently me) need to be mindful about that.

In a future version we may choose to define the localization based on their base letter unicodes rather than on GlyphData.xml glyph names, but because we’re already dependent on the glyph names for now, I'm keeping it that way.

They are currently rendered only into the .plist files, either CustomFilter_GF_Cyrillic.plist or the newly added CustomFilter_GF_All.plist

I used the top comment here in this issue for the definitions, ignoring the third section in that comment, which I didn't understand. Please check the above .plist files for accuracy and completeness.

alexeiva commented 2 weeks ago

@yanone

Thank you for your work. I am keen on checking the new set. About the 3rd comment, please change the text in the readme:

before

For Headline typefaces (?), with language support more Non-Slavic languages. Additional characters in this set provide support for the following 18 languages: Abkhaz, Chukchi, Enets, Eskimo, Even, Evenki, Itelmen, Khanty, Kildin Sami, Koryak, Mansi, Nganasan, Nenets, Oroch, Orok, Sakha/Yakut, Tati, Yukaghir, Yupik Ulch

after

Additional characters in this set provide support for the following 18 Non-Slavic Cyrillic languages: Abkhaz, Chukchi, Enets, Eskimo, Even, Evenki, Itelmen, Khanty, Kildin Sami, Koryak, Mansi, Nganasan, Nenets, Oroch, Orok, Sakha/Yakut, Tati, Yukaghir, Yupik Ulch

('Headline typefaces' are unrelated to these glyph sets)

yanone commented 2 weeks ago

Changed it. Please note that all previous changes are currently visible only in the repo files, as I'm still waiting for a signal to publish a new release.

Also, your comment was about Cyrillic Pro which we haven't reworked yet. It's okay to change the description, but we really need to find the language codes for each of these languages. Viviana said that we can focus on Cyrillic Core and Plus for now, which we have, but if you or anyone else feels like pulling out the respective language codes for Pro from gflanguages, that we be super awesome.

alexeiva commented 2 weeks ago

if you or anyone else feels like pulling out the respective language codes for Pro from gflanguages

can you point me to this gflanguages list?

yanone commented 2 weeks ago

It's here, in its own repo (just called lang annoyingly while the Python package is called gflanguages): https://github.com/googlefonts/lang/tree/main/Lib/gflanguages/data/languages You would search for language names and then give me the codes (such as ab_Cyrl).

I don't know what to do in case some languages cannot be found in that database. Then it gets more complicated. Do we first define them in gflanguages or do we ignore them? I don’t have the answer to that.

At least it would be a good idea to note somewhere (such as in the description) that a certain language was previously included but can't be found through the new assembly system.

yanone commented 2 weeks ago

Also note that the spelling might differ slightly of some languages between the old description and gflanguages

jpt commented 2 days ago

However, the Cyrllic Pro still needs to be redefined (please ignore it for the time being)

Is there an issue where this is being discussed? I am hoping Khanty will be included in Plus in the future, which is already in gflanguages but I think Tje is not in Plus

yanone commented 1 day ago

@jpt There is currently no other issue to discuss this. Khanty as defined in gflanguages has 14,000 speakers which doesn't meet the intended purpose of GF_Cyrillic_Plus covering languages between 240K and 3M speakers, so I would say it should go into Pro if we ever redefine it.

jpt commented 1 day ago

@yanone Apologies, I meant Pro. The old Pro docs do mention Khanty support but I think Tje wasn't encoded until Unicode 16, well after the glyphset was published