w3c / iip

Documenting gaps and requirements for support of Indic languages on the Web and in eBooks.
https://w3c.github.io/iip/
8 stars 15 forks source link

Break anywhere fails on conjuncts #78

Open r12a opened 4 years ago

r12a commented 4 years ago

line-break:anywhere causes lines to break inside words. It should break lines on grapheme cluster boundaries for all consonant clusters apart from the 3 special conjuncts. Chrome doesn't support the anywhere value. Firefox and Safari behave as expected.

The exceptions are the sequences க்ஷ k͓ʂ, and ஶ்ரீ ʃ͓ɾī / ஸ்ரீ s͓ɾī (which are synonyms). These sequences should not be broken during line breaking. Correct line breaking of these conjunct-forming sequences are not supported by default by Unicode grapheme clusters (which split them in two), and requires the application of tailored rules. Test: line-break:anywhere should not break shri or ksha conjuncts. Firefox is ok for shri and for ksha without a vowel-sign, but in ரிக்ஷா leaves க் on previous line, still shaped for half a conjunct. Safari is ok for shri in HTML, but leaves ஸ் behind in textarea; for ksha, in textarea leaves க் behind, in HTML initially moves whole word to next line, then puts ரி back at end of line as you decrease the window width. Similar results are produced for word-break: break-all, except that Chrome supports this property and value. Chrome wraps ஸ்ரீந as a single unit and ரிக்ஷா as a unit. The impact of this is advanced, although it would be good to fix it.

r12a commented 4 years ago

The first comment in this issue contains text that will automatically appear in the Tamil gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.