notofonts / latin-greek-cyrillic

Noto Latin, Greek, Cyrillic
SIL Open Font License 1.1
42 stars 8 forks source link

Check if Noto Sans+Serif covers Twi #174

Closed brawer closed 1 year ago

brawer commented 8 years ago

On https://groups.google.com/forum/#!topic/noto-font/ARPRO9BlSb8, we got a report that Noto might not work with Twi. We do list Akan [ak] as a supported language on google.com/get/noto, but not Twi [tw]. Maybe we just need to add Twi to the list of supported languages. On the other hand, according to Omniglot, there are accented letters in Twi but not in Akan. While we’re at it, we should also check the accents needed for writing Ga.

moyogo commented 8 years ago

Noto font supports Akan (macrolanguage of Twi and Fante) orthography. They have the required extended characters “Ɛ ɛ Ɔ ɔ” for the Unified Akan orthography. Diacritics that were or that may be used are also supported and positioned correctly on these.

They also support Ga orthography with the extended characters “Ɛ ɛ Ŋ ŋ Ɔ ɔ” and the combining diacritics.

However, in Noto Sans and Noto Serif, the preferred glyph n-shaped Eng.alt1 (present in both Noto Sans and Noto Serif) for many African languages using Ŋ isn’t displayed by default when the ISO 639 code [gaa] or other codes are used. In Arimo, Tinos, Cousine the n-shaped Eng is the default, in that case the N-shaped Eng is expected in Same languages and some Australian languages (some which may not have any OT language tag).

brawer commented 8 years ago

@moyogo, just to clarify, would you recommend that Noto uses a locl feature for these African languages? If so, for which OpenType language system tags specifically?

Regarding Australian languages that currently don’t have OpenType language tags, we can definitely propose their addition to the registry if you say it makes sense. Of course, this will only work with software that supports language tagging; and only if the authors have supplied the correct language tags.

andjc commented 8 years ago

@brawer if you want to use 'locl' there are a range of language systems that should be supported. But in reality what ecists in the opentype tag tables is just the tip of the iceberg, so to speak. An alternative would be to add the alternative forms to a character variant feature.

Btw if you do want to support african languages more accurately Eng isn't the only variant glyph that should be present.

Andrew

andjc commented 8 years ago

@brawer if you want to use 'locl' there are a range of language systems that should be supported. But in reality what ecists in the opentype tag tables is just the tip of the iceberg, so to speak. An alternative would be to add the alternative forms to a character variant feature.

Btw if you do want to support african languages more accurately Eng isn't the only variant glyph that should be present.

Andrew

brawer commented 8 years ago

@andjc, do you know more about this? If so, could you imagine writing some documentation, targeted at type designers? With Noto, Google is really trying quite hard to support all the world’s languages in good quality, but it’s incredibly difficult and time-consuming to find reliable documentation. And surely there’s also other font designers/producers who would love to support African languages, but are not sure what exactly needs to be done. In case you happen to know more, could you describe (ideally with references) which language uses what letterforms? If you do, future fonts will probably work better than today. It doesn’t have to be complete or use fancy language; any reliable information (= with references) would be so much better than nothing. And your work might eventually help other font projects as well.

moyogo commented 8 years ago

Btw if you do want to support african languages more accurately Eng isn't the only variant glyph that should be present.

Some of those issue have been reported already: notofonts/latin-greek-cyrillic#180, notofonts/latin-greek-cyrillic#179, notofonts/latin-greek-cyrillic#178. The N-shaped vs. n-shaped also applies to Ɲ. Ɓ, ɠ, Ɔ, ƥ, Ɽ, Ʋ can also have variants, but for a couple of these it’s not clear what is a variant and what is a preferred variant. Vertically stacked diacritics as a variant to Vietnamese stacked diacritics is also missing. There are probably others.

andjc commented 8 years ago

@brawer

Despite the rhetoric on the get noto web pages, most of the documentation and description of the Noto project I have seen in past seems to indicate the original aim was to support all scripts, not all languages, which is a very different type of project.

I would need to go through my old files and see what i collected. SIL as part of their font subsetting project prepared sets of documents on glyphs for geographical/regional subsets of their LCG fonts, There where a few other projects like Anloc and PanA12n projects that gathered some data. I've collected my own data for orthographies in East Africa. But It is a time consuming process and next few weeks are taken up with Unicode proposals for Cham script additions; finalising notes on Tai Aiton and Tai Phake (Myanmar script) requirements that aren't covered by UTN11; finalising Bamum and Bassa Vah fonts and keyboard layouts, and starting an inventory of variant glyphs in Mende Kikakui sources.

But I will see if I can start working on a document on African Latin script glyph requirements.

But as to how the glyphs should be made accessible in an opentype font raises a few issues around how opentype features are supported, esp in web browsers. The first issue is that (in my understanding) opentype language system tags aren't meant to represent languages per se, but typographic systems that may be shared between a group of languages. But it appears developers are more inclined to see a direct link between tag and language.

The most common way of kicking in localised glyphs (via 'locl' feature) is based on the language tag on a HTML element. while this makes life easy the results can very dramatically from browser to browser. Firefox is the most permissive in what language tags it will work with, esp when using Graphite rendering instead of OpenType. Chrome and Internet Explorer are more discerning than Firefox, which means that not all 'locl' features available in all the various fonts i have experimented with will work in IE and Chrome. Additionally their lack of support or good support for font-language-override complicates the issue.

This is even more apparent in cases where locl support has been added for tags that don't exist in the official registry, ie there has been a need for a language tag, but no appropriate tag exists.

It is also problematic for languages that don't have a tag, but share a typographic tradition with a language that does have a tag.

Currently best practice font development for the web would be to expose the localised forms made available via locl through other opentype features, eg stylistic sets or character variants. This would allow web developers and typesetters to tailor their typography, when selection of a language isn't sufficient to get the job done.

Although with fonts like Padauk, considering the poor cross browser support for some of the required css features, we've resorted to reenginering the fonts, so that we have a four fonts, one for each language system supported (including DFLT). This is a nightmare approach, but given the current constraints of web browsers, it is the only approach guaranteed to work cross browser and cross platform (even with the most recent versions if each browser as the baseline).

A.

andjc commented 8 years ago

@brawer,

I have started work on a document for African languages, I will circulate a draft copy to a range of people I know working on African language issues, for their feedback. Is there anything in particular you would like to see in such a document?

jungshik commented 8 years ago

@andjc Thank you for your help.

Despite the rhetoric on the get noto web pages, most of the documentation and description of the Noto project I have seen in past seems to indicate the original aim was to support all scripts, not all languages, which is a very different type of project.

I'm not sure what you have seen. IIRC, we (e.g. my Unicode conference presentation on Noto a few years ago) have made clear that we want to support all the scripts and languages of the world (that certainly includes a lot of African languages written in Latin script) as well as possible. If it has not been clear, we have to make it clear again.

Obviously, we're not there yet and your help would be very much appreciated.

@pychen1969

brawer commented 8 years ago

@andjc, thanks so much! Do you have already have a draft of your document [about letterforms in African languages] that can be shared?

Is there anything in particular you would like to see in such a document?

Which localized letterform to use for what language—this information is incredibly hard to find. For example, the Wikipedia article about the Eng letter mentions that different letterforms should be used for African languages versus Sami, but it does not really present this information in a way that a font designer can quickly see what needs to be done.

Also, if you have information on which languages need which accents (or other combining marks), it would be really helpful too. If you make a list (even if it’s incomplete), I can make sure that Unicode CLDR contains them as exemplar characters.

Finally, do you know people who could help translating the Universal Declaration of Human Rights? When developing Noto, we often render the existing translations to make sure it looks reasonable; but many languages are missing; here’s how to help Unicode with this project.

The most common way of kicking in localised glyphs (via 'locl' feature) is based on the language tag on a HTML element. while this makes life easy the results can very dramatically from browser to browser.

Agree that current web browsers have pretty broken support for localized letterforms, but I think it should be possible to fix this quite easily. Currently, the mapping from BCP47 (used by the HTML lang attribute, the HTTP Content-Language header, or the xml:lang attribute in XML) to OpenType language systems is not very clearly defined. I’m working on a draft spec to clarify this mapping, plus a conformance test so that browser vendors can test their implementation; see draft. Once the mapping is clear, I’ll make the needed changes to the Harfbuzz shaping library, which is called by Firefox and Chrome; so at least these two browsers should soon support locl features.

andjc commented 8 years ago

@brawer the example of Eng illustrates the problem of the locl feature. locl is useful but very difficult to fully and comprehensively implement. A betetr approach is to use character variants.

Both Chrome and Firefox to include locl via html lang tags, but this will only ever be a partial implementation, Chrome implementing font-language-override like Fireofx has would be more useful.

There has been a recent thread on the css mailing list about font-language-override and I would quote part fo one of John Hudson's emails in this context:

"I'm going to be pedantic and insist that we use the full correct term 'language system' for the OTL tags — even though it is itself a misleading misnomer —, because it is important to note that these are not language tags, but a means of activating particular typographic display, which may or may not map to document language tagging; indeed, it might not map to a language at all, as in the case of OTL language system tags for IPA and Americanist phonetic transcription."

if you really want to map bcp-47 to OT language system tags, i suspect it will take you years to do. You would need to identify the orthography for most languages and identify which codepoints may need alternative glyphs. This information does not reside in any easily accessible form.

Alternatively you could just map the existing OT language tags to BCP-47 tags ... but a lot of that work is already available in a naive way in firefox and chrome already.

And there are gotchas .. font developers may add one language system but not another ... serbian and macedonian are good examples, not al fonts will have both language systems, so those language tags need to map to their corresponding OT language system and in its absence map to the other.

The karen OT language system would map to approximately 20 language codes, but there are a number of different typographic traditions, so mapping the distinct karen languages to the karen OT feature is a very very bad idea.

there are a handful of african ot language systems mapping all the african languages to them would be a nightmare ... not to mention the political issues.

where obvious locl should be added, like Dinka ... which you would need 6 iso-639-2/3 tags to map to the ot language system

but glyphs should be exposed other ways ... like character variants cvNN, font-language-override, etc

currently there is limited use of aalt in noto sans, but this is not a good approach, if many more alternative gylphs are added, and aalt is the way those glyphs are exposed, it would quickly become very difficult on the web. aalt works fine in Indesign exposing a glyphs in a pallette for an editor to choose. But in HTML, it would require per character markup for those characters you wish to use an alternative glyphs for.

andjc commented 8 years ago

Hi @jungshik,

I have seen comments form google that talk about script and languages .. i have also seen comments that talk only about scripts.

And looking at the fonts, and their use of locl, to date Noto has focused on dominant typographic traditions and little has been done to enable minority languages or divergent typographic traditions.

For Noto Sans and Noto Serif , its not just a question of African languages, some thought may need to be given to minority languages in SE Asia, esp ethnic minority languages.

You list the Noto Arabic fonts for Cambodia, but they may not be adequate for all Arabic orthographies in Cambodia. I am still trying to ascertain the requirements of a particular language there. Like wise I need to confirm its adequacy for certain languages in Southern Vietnam.

Noto Sans Myanmar and Notos Sans Devanagari, last time I looked needed more language systems added, There are probably lots more. I haven't looked at Thai minority language support, Ethiopic or many others. But they are some to look at.

marekjez86 commented 7 years ago

PDFs with UDHR text for three flavors of Akan as supported by Noto {Sans,SansMono,Serif,SansDisplay,SerifDisplay} x {Regular,Italic}: https://github.com/googlei18n/noto-fonts-alpha/tree/master/udhr-test/basic-width-weights/Akan_Akuapem https://github.com/googlei18n/noto-fonts-alpha/tree/master/udhr-test/basic-width-weights/Akan_Asante https://github.com/googlei18n/noto-fonts-alpha/tree/master/udhr-test/basic-width-weights/Akan_Fante

Could you file bugs if required?

simoncozens commented 1 year ago

I'm not sure where this issue is up to. From what I can gather, Noto Sans+Serif's codepoint coverage is fine for Twi, but there are questions about the appropriate form of the eng glyph... and then the topic started to drift.

@moyogo / @NeilSureshPatel, do you know more?

andjc commented 1 year ago

@simoncozens my understanding is that: the fonts, as they were, supported Twi. Only extended Latin requirements were open-e and open-o.

Although I can't remember if font subsetting on Google Fonts impacted Twi, and whether that was at the root of the report.

Most of the discussion is more related to need for a strategy or technical approach to handling African languages.An area with Noto fonts are notoriously poor at.

simoncozens commented 1 year ago

Although I can't remember if font subsetting on Google Fonts impacted Twi, and whether that was at the root of the report.

That's quite likely. I'll take a look at that.

Most of the discussion is more related to need for a strategy or technical approach to handling African languages.An area with Noto fonts are notoriously poor at.

Well, there's a reason I tagged Neil Patel! African language support for Noto is something we're working on.

simoncozens commented 1 year ago

I'll take a look at that.

Yeah, it looks like we subset out ɛ, ɑ, ɔ, and probably other stuff as well. This is a https://github.com/googlefonts/glyphsets issue, so I'll reopen over there and link here.