w3c / elreq

Ethiopic Layout Requirements
Other
10 stars 4 forks source link

Does Ethiopic text also get wrapped by word? #116

Open r12a opened 4 years ago

r12a commented 4 years ago

http://w3c.github.io/elreq/#ethiopic_line_breaking and http://w3c.github.io/elreq/#ethiopic_hyphenation indicate that languages using the Ethiopic script break character by character, regardless of whether space or the word-separator are used between words.

However, major browsers actually break on word boundaries (space or word-sep), and i'm not sure whether that might be establishing a new trend. Any thoughts on that?

r12a commented 4 years ago

This is related to CSS issue https://github.com/w3c/i18n-discuss/issues/11

dyacob commented 4 years ago

Reading the hyphenation section now and focusing on the phrase:

When wordspace fell out of favor in modern writing the practice of splitting a word across lines of text continued without change.

This is certainly the case, samples of this breaking style (arbitrary word split) can be found, but I would think has all but faded out by the end of the 90s with word processor taking over. Line breaking at a word boundary would be a best practice, and expected, in modern writing. Expected to the extent that it should be applied to the reprinting of any early works that used the split-anywhere style under white space.

r12a commented 4 years ago

Using CSS line-break:anywhere it is possible to cause a line to be broken character by character, rather than word-by-word. If that is set, however, it becomes possible to wrap a wordspace to the start of a new line on its own. See it in action at this link (Use the control to reduce the width of the boxes, character by character.)

Is that ok? Or do we need to implement rules that prevent wordspace being wrapped alone?

dyacob commented 4 years ago

This is a really nice demonstrator/tester! Thanks for creating it @r12a , I'll spend more time with it this evening.

Wordspace should not be wrapped alone, so a rule would be needed. This would be true of punctuation as well.

I observed another rule this past weekend while scanning a book, "ዜናዊ ፓርልማ" from 1953 (1946EC) produced by a government agency. The book uses wordspace and frequent word breaking at the end of a line. The breaking always leaves at least 2 letters at the end of the line; it so consistent and frequent that it must be deliberate. This is in contrast to handwritten manuscripts where the split can occur after a single letter. It is visually appealing, I had not picked up on it before, I'll review other works to see if this "at least two" rule is applied.

dyacob commented 4 years ago

I've reviewed a number of 20th century books (circa 1940s-1960s) and this "rule of two" appears to hold up very well. Notably in a book by Abie Gubenya who was a well respected writer. On occasion a single letter may yet be found at the end of a line, but in these exceptions the letter is usually "anchored" to the line by a visually dense/heavy punctuation the left such as « or ። .