w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 668 forks source link

[css-text] Allow alias for language hyphenation #5270

Open sujato opened 4 years ago

sujato commented 4 years ago

The CSS spec provides for hyphenation of text, leaving the choice of language up to the UA:

https://www.w3.org/TR/css-text-4/#hyphenation

Currently Firefox offers the best support, but even they only support fairly small subset of the world's languages.

https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens

The thing is, it is sometimes better to have imperfect hyphenation than none at all. No hyphenation can result in a broken UI and unreadable text, whereas imperfect hyphenation might work fine, or at worst be merely inelegant.

I work with texts in Pali and Sanskrit, which can have very long words formed by compounding. There is no browser support for hyphenation for these, nor is there likely to be. Surely these are not the only languages affected. Here is a typical example, rendered in firefox:

Screenshot from 2020-06-30 09-26-16

It is possible to hack around this by activating hyphens and setting lang='la':

Screenshot from 2020-06-30 09-25-58

This is identical to the result that a proper Pali hyphenation would produce. Note that in tradition Indic orthography, there is no concept of a correct breakpoint; scribes merely wrote to the end of the line and continued on the next line. Thus the traditional practice would agree with the idea that sometimes any breakpoint is better than none.

However, it's obviously not a good idea to deliberately set a false language. Hence my proposal:

Allow the CSS to declare a language alias for hyphenation.

So the text language is unaffected, and the HTML does not change. But the user can declare via CSS something like:

hyphenate-alias-languages: pli, la;

Meaning: "for the purpose of hyphenation, Latin and Pali may be substituted."

Such substitution would apply only if explicit support for that language is missing. So if lang='pli' is set on the HTML, then if one UA has support for Pali hyphens, that is used, if not, it looks for support for Latin.

faceless2 commented 4 years ago

That seems very reasonable to me, although I'd suggest a syntax more like:

:lang(pli) { hyphenate-language-fallback: la; }

I would have gone with hyphenate-language-override to align with font-language-override, but @sujato states the intention to to provide an equivalence only if support for the intended language is missing. CSS uses the term "fallback" for this type of concept in css-counter-style-3. It should probably accept a comma-separated list - I don't think the presence of a hyphenation dictionary for latin can be guaranteed!

sujato commented 4 years ago

Oh yes, that looks much better, thanks!

And yes, a comma separated list would be ideal.

AmeliaBR commented 4 years ago

Labelling this to get feedback from internationalization experts, but I agree that this sounds like a good proposal. We definitely don't want authors to hack around with incorrect language tags just to get hyphenation!

fantasai commented 4 years ago

Wouldn't it make more sense to build this information into CLDR and have the aliasing built into the browser?

sujato commented 4 years ago

@fantasai I'm not really sure how all this works, but my concern would be that this should be left up to the site designer. It's hard to say that X language hyphenation will be an adequate fallback for Y language in all cases; whereas it is, I think, possible to say that it will work in this case. As with most typographic refinements, there are pluses and minuses, and the site designer would need to weigh them up.

xfq commented 4 years ago

@sujato One way to solve this issue is to come up with a reasonable default set of aliases (if possible) and add the aliases to the user agent style sheet (or just add them to css-text as the default hyphenation behavior), and the author can override them in their own style sheet.

sujato commented 4 years ago

Sure, that might work, so long as it is possible to override the defaults. Personally I prefer to let people opt in, but I will leave that to the experts!