Open r12a opened 4 years ago
_The first comment in this issue contains text that will automatically appear in the Javanese gap-analysis document as a topic with the same title as this issue. Any edits made to that comment will be immediately available in the editor's draft of the document. Proposals for changes or discussion of the content can be made in comments below this point._
Line breaking at orthographic syllable boundaries for Javanese and several other scripts was specified in the Unicode Line Breaking Algorithm in Unicode 15.1 based on L2/22-080R2. ICU has been updated and has started to roll out to browsers and platforms. The test now passes in Safari on iOS 17.4 and on macOS 14.4.
Text in first comment updated and bug report raised for Gecko.
This issue is applicable to text written in the following scripts: Balinese, Batak, Brahmi, (Eastern) Cham, Dives Akuru, Grantha, Gurung Khema, Javanese, Kawi, Makasar, and Tulu Tigalari.
Words are not separated by spaces in the Javanese orthography. Javanese is also one of a small number of scripts where an initial consonant for a word may be subjoined below the final consonant of the preceding word. Because these stacked consonants cannot be split, segmentation for line-breaking, etc. uses orthographic syllables as a unit, where orthographic means a character or stack of characters with all associated combining marks.
Unlike Thai, which uses dictionary lookup to wrap word-by-word, the basic break points in Javanese can be calculated using a grammar for syllables. (There are likely to be additional considerations to check related to punctuation, digits, etc.)
See this discussion for examples.
It is possible to fudge things, using CSS properties, so that the text wraps, but the resulting line breaks are not always correct. It is also possible to make the breaking happen by inserting ZWSP at appropriate places, but we cannot expect Javanese users to do that accurately and consistently.
More:
The GAP
Gecko and Blink don't wrap at all in rendered HTML. In the textarea, however, lines ARE broken by orthographic syllables in a
textarea
element. Webkit also fails in HTML. In the textarea it is inconsistent, sometimes wrapping sequences of characters, sometimes orthographic syllables, sometimes longer sequences, sometimes it does the latter then pulls back some characters from the 2nd line as the width continues to shrink.Priority
This section marked as broken because browsers do not apply the necessary algorithm to to insert line-break opportunities at orthographic syllable boundaries for Javanese text in normal HTML. As a result, a line of text tends to run off the right edge of the window.
Tests & results
Interactive test, Text should wrap to the next line at the line end
Interactive test, Text should wrap to the next line at ZWSP
All browser engines tested break lines as expected if ZWSP is inserted between all breakable syllables.
Action taken
Line breaking at orthographic syllable boundaries for Javanese and several other scripts was specified in the Unicode Line Breaking Algorithm in Unicode 15.1 based on L2/22-080R2. ICU has been updated and has started to roll out to browsers and platforms. (From Norbert Lindenberg)
Bug report: Gecko
Outcomes
Blink and WebKit browsers now break lines as expected.
Gecko browsers still fail to wrap in rendered HTML, but do wrap in a textarea control.