w3c / sealreq

Southeast Asian layout task force
34 stars 6 forks source link

Some browsers fail at word-based line breaking #34

Open r12a opened 4 years ago

r12a commented 4 years ago

This issue is applicable to Lao, but similar issues may apply to Thai, Myanmar, Khmer, etc.

Because Lao doesn't use spaces between words, but does wrap text at word boundaries, inter-word spaces can't be used for line-breaking. Occasionally zero-width spaces are inserted in the text, but these are not common and content authors tend to rely on the browser applying heuristics and dictionaries to locate word boundaries and insert break opportunities.

The CSS spec says the following, but the spec doesn't provide any specific rules or properties for Lao line-breaking.

Scripts such as Thai, Lao, and Khmer, however, do not use spaces or punctuation to separate words. Although the zero width space (U+200B) can be used as an explicit word delimiter in these scripts, this practice is not common. As a result, a lexical resource is needed to correctly identify soft wrap opportunities in such texts

More:

Tests & results:

Also The increasing use of loan words often breaks the algorithms used for defining syllable boundaries, as does the still common preference for using the two-character combination for "HL" instead of the subscript L, and words that use other high-class sonorants than HM, HN, HL.(@jmdurdin) - interactive test, [Lao text wraps the h in an initial cluster with the rest of the syllable](https://github.com/w3c/line_paragraph_tests/issues/58) - interactive test, [Lao breaks line around words rather than syllables ](https://github.com/w3c/line_paragraph_tests/issues/59) The success in dealing with the latter two tests depends on the quality of the dictionary. For example, none of the dictionaries used appear to recognise the word for dog when written as ຫມາ - they only recognise it with the initial ligature, as in ໝາ.

r12a commented 4 years ago

_The first comment in this issue contains text that will automatically appear in the Lao gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point._