w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
8 stars 15 forks source link

Grapheme clusters fail to represent syllabic conjuncts in north Indian scripts #87

Open r12a opened 4 years ago

r12a commented 4 years ago

The Unicode concept of 'grapheme cluster' currently fails to represent syllabic conjuncts (plus vowels, etc) in scripts like Devanagari. This means that various editing operations, line breaking algorithms, vertical text, etc. are likely to break text at the wrong point.

Indic Layout Requirements provides a grammar for indian orthographic syllable boundaries which works for Devanagari, and CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that these cases are beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support. In addition, a modification to the concept of grapheme cluster is currently in development at the Unicode Consortium, which is likely to resolve the problem for a script like Devanagari.

See requirements at: Indic Layout Requirements

Specs CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that these cases are beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support.

Tests

r12a commented 4 years ago

The first comment in this issue contains text that will automatically appear in the Devanagari gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.