w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
9 stars 15 forks source link

Letter-spacing & first-letter selection must keep yo-phola with preceding independent vowels #67

Open r12a opened 4 years ago

r12a commented 4 years ago

This issue is applicable to Bengali and Assamese.

The issues Letter-spacing splits conjuncts and Conjuncts are not selected as a single unit when styling initials describe how conjuncts should not be split by letter-spacing. See those issues for more details.

This topic builds on that for some specific cases in Bengali.

There are two cases in Bengali where hasant (virama) is preceded by an independent vowel, rather than a consonant. These are:

Screenshot 2021-12-14 at 16 40 16

Screenshot 2021-12-14 at 16 40 37 Screenshot 2021-12-14 at 16 40 46

(In both cases this produces the sound æ, used for non-native words, such as 'application', 'administration' etc.)

This combination should not be split either, even though it doesn't fit the typical CvC structure of a conjunct (where 'v' is the virama).

Specs: css-text-3 CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that the cases just described go beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support. The spec doesn't provide details about the support needed for each language.

Tests & results: Both of the following tests were run with the following pre-installed fonts:

Windows: Shonar Bangla, Arial Unicode MS, Nirmala UI, Vrinda
Mac: Bangla MN, Bangla Sangam MN, Kohinoor Bangla, Tiro Bangla, Baloo Da
Also tested with Noto Sans Bengali and Noto Serif Bengali on the Mac.

Interactive test, Bengali অ্যা and এ্যা (æ) are selected as a single grapheme by ::first-letter.

Note that Blink and Webkit actually handle the more usual CvC conjunct arrangement (see this test).

Interactive test, Bengali অ্যা and এ্যা (æ) are treated as a single grapheme for letter-spacing.

Gecko, Blink, and Webkit all fail to treat the sequence as a single grapheme, despite the fact that Blink and Webkit actually handle the more usual CvC conjunct arrangement (see this test).

Browser bug reports: GeckoBlinkWebkit

Priority: Keeping such sequences together is a pretty basic requirement. That said, first-letter selection and letter-spacing are not essential for content authoring, although Bengali content authors should still have equal access to these styling features as Westerners. Content authors could work around the first-letter problem by adding markup (though that's not ideal), but for letter-spacing there is no real alternative, and adding spaces between letters ruins the semantics. The priority was set to advanced.

r12a commented 4 years ago

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include: _Bengali_