Open litherum opened 4 years ago
Yes, it does seem like a rather niche case (this font doesn't support language X, but if I tell the font that this is actually language Y it comes out better for language X).
I concur with dropping it (more on the undesirability than the single implementation).
Cc @jfkthame
(This issue is about both the property and the descriptor.)
Philosophically, there shouldn't be two places authors can specify language to get correct text shaping.
I don't think it's quite as simple as this. The use case for font-language-override
arises because of a mismatch between different things referred to (rather loosely) as "language". In HTML, authors can tag content with the lang
attribute, normally thought of as "language" although it can carry additional subtags such as script and region, so it's really a locale identifier.
When it comes to text shaping, however, the functionality in OpenType fonts is driven via tags that are often referred to as "language", but are more formally called "language system" tags. This is not at all the same thing.
Quoting from https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags (emphasis added):
Language system tags identify the language systems supported in a OpenType Layout font. What is meant by a “language system” in this context is a set of typographic conventions for how text in a given script should be presented. Such conventions may be associated with particular languages, with particular genres of usage, with different publications, and other such factors. For example, particular glyph variants for certain characters may be required for particular languages, or for phonetic transcription or mathematical notation.
The OpenType tag is about a set of typographic conventions, not directly about language (although it is often possible to infer a reasonable default mapping from one to the other).
In principle, a given set of conventions may be shared across multiple scenarios. For instance, two different languages (perhaps unrelated) may happen to follow the same conventions. Language system tags can be registered on a perceived-need basis, however; as a result, there is no guarantee that each tag represents a distinct and unique set of conventions. Tags can, however, be registered with the intent of representing conventions that apply to multiple languages. In such cases, the documented description for the tag should reflect that intent.
It should also be noted that there may be more than one set of typographic conventions that apply to a given language. Therefore, in several respects, language system tags do not correspond in a one-to-one manner with languages. Even so, many registered tags are intended to represent typographic conventions for a particular language. For cases in which a correlation exists between a tag and one or more languages, the language identities are documented here by reference to ISO 639-2 and ISO 639-3.
While many such correlations are documented, there is no claim to completeness, and given the complexity (and ever-evolving conventions) of human language and writing systems, it would be futile to expect it.
If information is available to an application declaring the language of text content, then the application may make use of that to select a default language system tag to be applied when displaying that text. It is preferable, however, to give users control over the choice of language system tag to be used. (Depending on the application scenario, such control may be given to content authors, to content readers, or to both.)
font-language-override
exists precisely to give users control here, as recommended by the OpenType spec, recognizing that (a) it is impossible for a browser to correctly anticipate every mapping from language, as expressed in the HTML lang
tag, to desired writing system conventions as expressed via OT language system; and (b) to require authors to artificially change the lang
tag in order to access desired writing system conventions in a font would be actively harmful.
For example, the OT tag registry includes 5 different tags for Karen languages: BLK
, KJP
, KRN
, KSW
, PWO
. An advanced Burmese font might support all 5 of these, with certain differences in glyphs and shaping behavior. However, there are many more than 5 languages and dialects within the Karen group, and in some cases writing conventions may not even be well-established or documented yet. An author should not have to mislabel content with the lang
tag of one of the major Karen languages just to access their preferred rendering behavior. font-language-override
allows content to be given an accurate lang
tag, and separately allows the author to choose the desired rendering behavior when a font provides multiple options.
So I am opposed to dropping this. Yes, it's a niche use case, but it is a valid one; I strongly disagree with labelling it "undesirable".
I don't know enough to make a case for it either way, but we have this working, so there will two shipping implementations at some point.
As a philosophical argument was raised, it's similar to the fallback requested for hyphenation in https://github.com/w3c/csswg-drafts/issues/5270. I think most of the arguments made here apply to that issue as well.
cc @r12a
I agree with @jfkthame. There must be plenty of minority or less-common languages which, labelled properly, would not trigger the shaping required from a font, whereas the OT tag could be used to indicate that "the rules appropriate to
Here's another example. The Scheherazade font has the ability to turn this:
into this, for Kurdish (exactly the same code points):
Kurdish can be labelled using ku
, but that's actually a macrolanguage in BCP-47 which groups together ckb
(central kurdish), kmr
(northern kurdish), and sdh
(southern kurdish). That transformation will be applied if you label your content as lang="ckb"
or lang="kmr"
, but not if it's labelled lang="sdh"
. Also, if your content is (and it may well be) labelled as lang=ku
, that transformation will not be applied, either. However, if you use font-language-override: kur
you will get the transformation, whatever language tag you use. (Note that kur
is not a BCP-47 language tag, btw.)
So, basically, font-language-override
is not a duplicate selector for language, it's a selector for a particular set of glyph transformations in a font, which happen to be grouped and labelled along linguistic lines (though not necessarily with BCP language tags), which can be applied if the appropriate lang
attribute for the content doesn't produce the desired effect.
hth
Okay. How about this related question, then:
What would have to happen in order to make it acceptable to remove font-language-override? Adding more flexibility to BCP language tags? Something else?
I think we need a well-defined interoperable way to compute OpenType language system tags. Maybe in OpenType spec or in CLDR?
This is coming up again, due to https://github.com/WebKit/WebKit/pull/14837
A few new thoughts:
lang
values like ja-Latn
lang
isn't expressive enough, it seems like we should be going to either HTML or the IETF (who defines BCP47) to make it expressive enoughfont-language-override
has a pretty bad fallback story; there is no guarantee the font that actually gets chosen to be used by the UA actually supports the value supplied. lang
can and does affect font selection, though.Looks like CLDR is adding support for OpenType language tags: https://unicode-org.atlassian.net/browse/CLDR-337 / https://github.com/MicrosoftDocs/typography-issues/issues/1030
Looks like CLDR is adding support for OpenType language tags: https://unicode-org.atlassian.net/browse/CLDR-337
The issue CLDR-337 appears to be closed as "out of scope", afaics.
Maybe it's the name of the property that is the stumbling-block here, to some extent. Would it be better if it were called font-typographic-convention
? The initial value would be auto
, meaning to use behavior implied by the content language (or any other available clues), but it would also allowing an explicit choice of an OpenType "language system" tag (specified as a string).
A request for a non-auto
typographic convention could be treated as an input to the font-matching algorithm, causing the UA to explicitly look for a font that supports the requested rendering. (This could of course equally well be done with font-language-override
, though we haven't heard any call for this, afaik. The assumption has generally been that an author wanting this level of control would use it in conjunction with a specific webfont.)
Would it be better if it were called
font-typographic-convention
?
I don't think so - typographic conventions are already applied by lang=
.
font-language-override is only implemented by one engine, and has been at risk for a long time.
Philosophically, there shouldn't be two places authors can specify language to get correct text shaping.
We should remove font-language-override from the spec.