w3c / sealreq

Southeast Asian layout task force
34 stars 5 forks source link

Text wrapping fails for Javanese & other SE Asian scripts #40

Open r12a opened 4 years ago

r12a commented 4 years ago

This issue is applicable to text written in the following scripts: Balinese, Batak, Brahmi, (Eastern) Cham, Dives Akuru, Grantha, Gurung Khema, Javanese, Kawi, Makasar, and Tulu Tigalari.

Words are not separated by spaces in the Javanese orthography. Javanese is also one of a small number of scripts where an initial consonant for a word may be subjoined below the final consonant of the preceding word. Because these stacked consonants cannot be split, segmentation for line-breaking, etc. uses orthographic syllables as a unit, where orthographic means a character or stack of characters with all associated combining marks.

Unlike Thai, which uses dictionary lookup to wrap word-by-word, the basic break points in Javanese can be calculated using a grammar for syllables. (There are likely to be additional considerations to check related to punctuation, digits, etc.)

See this discussion for examples.

It is possible to fudge things, using CSS properties, so that the text wraps, but the resulting line breaks are not always correct. It is also possible to make the breaking happen by inserting ZWSP at appropriate places, but we cannot expect Javanese users to do that accurately and consistently.

More:

The GAP

Gecko and Blink don't wrap at all in rendered HTML. In the textarea, however, lines ARE broken by orthographic syllables in a textarea element. Webkit also fails in HTML. In the textarea it is inconsistent, sometimes wrapping sequences of characters, sometimes orthographic syllables, sometimes longer sequences, sometimes it does the latter then pulls back some characters from the 2nd line as the width continues to shrink.

Priority

This section marked as broken because browsers do not apply the necessary algorithm to to insert line-break opportunities at orthographic syllable boundaries for Javanese text in normal HTML. As a result, a line of text tends to run off the right edge of the window.

Tests & results

Interactive test, Text should wrap to the next line at the line end

Interactive test, Text should wrap to the next line at ZWSP

All browser engines tested break lines as expected if ZWSP is inserted between all breakable syllables.

Action taken

Line breaking at orthographic syllable boundaries for Javanese and several other scripts was specified in the Unicode Line Breaking Algorithm in Unicode 15.1 based on L2/22-080R2. ICU has been updated and has started to roll out to browsers and platforms. (From Norbert Lindenberg)

Bug report: Gecko

Outcomes

Blink and WebKit browsers now break lines as expected.

Gecko browsers still fail to wrap in rendered HTML, but do wrap in a textarea control.

r12a commented 4 years ago

_The first comment in this issue contains text that will automatically appear in the Javanese gap-analysis document as a topic with the same title as this issue. Any edits made to that comment will be immediately available in the editor's draft of the document. Proposals for changes or discussion of the content can be made in comments below this point._

NorbertLindenberg commented 8 months ago

Line breaking at orthographic syllable boundaries for Javanese and several other scripts was specified in the Unicode Line Breaking Algorithm in Unicode 15.1 based on L2/22-080R2. ICU has been updated and has started to roll out to browsers and platforms. The test now passes in Safari on iOS 17.4 and on macOS 14.4.

r12a commented 2 months ago

Text in first comment updated and bug report raised for Gecko.