w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.46k stars 657 forks source link

[css-text] Providing alternative breaking behaviours for Ethiopic #4765

Closed r12a closed 4 years ago

r12a commented 4 years ago

https://www.w3.org/TR/css-text-3/#word-break-property https://www.w3.org/TR/css-text-3/#line-break-property

Modern Ethiopic text is generally wrapped word by word. If wordspace separators are used, they are wrapped with the word, and should not appear alone at the beginning of a line.

However, older Ethiopic text is generally wrapped wherever it hits the right margin, whether wordspace or space are used to separate words, and no hyphenation occurs.

Observation: It's possible that a rule is sometimes applied to letter-based wrapping that requires a minimum of 2 letters at the end of a line for printed text (as opposed to handwritten manuscripts). This was observed by Daniel Yacob in the book, "ዜናዊ ፓርልማ" from 1953 (1946EC).

Whatever style of wrapping is used, however, punctuation wrapping rules apply, which means that a wordspace separator should not appear at the start of a line, nor various other punctuation, even when letter-by-letter wrapping occurs.

So my question is: how can an Ethiopic author can apply the different wrapping styles to Ethiopic?

My best guess is that line-break:anywhere is not appropriate, since it doesn't respect punctuation rules. However, word-break:break-all may be the right thing, although it doesn't specifically mention punctuation-specific rules. Am i correct? It wasn't abundantly clear from reading the spec.

frivoal commented 4 years ago

My best guess is that line-break:anywhere is not appropriate, since it doesn't respect punctuation rules. However, word-break:break-all may be the right thing, although it doesn't specifically mention punctuation-specific rules. Am i correct?

Yes. word-break: break-all will do what you want here, and does not affect punctuation: UAX14 rules define which punctuation characters can / cannot be separated from the previous character by line breaking, and word-break:break-all does not change that. The Ethiopic wordspace will also have the right behavior: It has the BA line-breaking class in UAX 14, making it inseparable from its preceding letter, and it will therefore not be placed at the beginning of a line.

Requiring a minimum of X letters on a line is not addressed, in css-text-3, but it expected to be addressed in css-text-4. See https://drafts.csswg.org/css-text-4/#last-line-limits

It wasn't abundantly clear from reading the spec.

This bit of text, to be found under the definition of the word-break property, should be providing enough context to make the definition of break-all unambiguous in that respect:

This property specifies soft wrap opportunities between letters, i.e. where it is “normal” and permissible to break lines of text. Specifically it controls whether a soft wrap opportunity generally exists between adjacent typographic letter units (and/or non-letter typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes [UAX14]). It does not affect rules governing the soft wrap opportunities created by white space (as well as by other space separators) and around punctuation.

I think this is clear enough, but if you have a suggestion for improvement, or a concern about some of it, feedback is very much welcome.

On the other hand, an example about using word-break:break-all to switch between the two Ethiopic behaviors may be a more productive way of illustrating this. Happy to include one if someone can provide me with the right text.

r12a commented 4 years ago

Thanks @frivoal. The 'around punctuation' text is a little vague for me. Note also that none of the major browser engines does what you'd expect here. Try changing the width of the bounding box in this test. You'll see that the wordspace wraps to the next line alone. It shouldn't do that, and note in particular that none of them wrap the wordspace to the next line by default (try this test), so i'm assuming that the browser implementers all misunderstood the point here too, since they made a change that does the wrong thing.

Here's some suggested text (inline markup showing here, but just for C&P convenience):

As a final example, in modern use of the Ethiopic script words are surrounded by spaces and usually wrap, unbroken, to the next line. Sometimes, however, Ethiopic may be written with <span class="codepoint" translate="no"><span lang="am">&#x1361;</span> [<span class="uname">U+1361 ETHIOPIC WORDSPACE</span>]</span> rather than a space, and split words while wrapping, with no hyphenation. word-break: break-all can be used for this. Note that applying word-break:break-all doesn't affect the Ethiopic rules for punctuation, which require that there is no line-break opportunity before an Ethiopic wordspace.

I can provide a screen shot of some Amharic text, if you like.

dyacob commented 4 years ago

A bichromatic scan of the work that @r12a referenced is here: https://drive.google.com/open?id=1wJm53QevGzAZGBiMHBnGB6oPfhP-sA8B , the copyright declaration is no longer applicable. The work also presents an example of justification in presence of the wordspace.

r12a commented 4 years ago

Requiring a minimum of X letters on a line is not addressed, in css-text-3, but it expected to be addressed in css-text-4. See https://drafts.csswg.org/css-text-4/#last-line-limits

Clarification on this point: it's not about the last line in a para, but rather about the last word on a (any) line, ie. no word is broken such that only the last character in the word appears at the start of a line. (I vaguely remember hearing about a similar rule recently, but i can't remember which script/language was the context. So this might possibly be a rule that affects other languages than those that use Ethiopic script.)

Screenshot 2020-02-26 at 13 17 04

astearns commented 4 years ago

I wonder if there's a way we could tie this in to the hyphenate-limit-chars property. The three-value version of the property allows you to say hyphenate-limit-chars: auto 2 auto to avoid a single character before a hyphen. This seems like the same thing, just without the hyphen. Perhaps we need a break-limit-chars property?

fantasai commented 4 years ago

@r12a Breaking before Ethiopic Word Space in that test case looks like a mistake. For example, UAs don't break before commas and periods and colons. I think this might just need a WPT testcase and some bugs filed. We can also put an example in the spec, given some sample text.

@astearns Good point, though I think this is a little different than hyphenation in that you can break between any two characters, not a particular lexically-allowed points in the word. It would be nice to re-use the same controls, though.

fantasai commented 4 years ago

@r12a While we're on the topic, @r12a can you get ELREQ updated? Last publication is 2016.

fantasai commented 4 years ago

Alright, edited in Ethiopic as one of the writing samples for word-break. @r12a Let me know if the text looks correct. I suggest we file break limits as a separate issue.

r12a commented 4 years ago

Thanks @fantasai. I have a small suggestion: add a link to https://www.w3.org/TR/css-text-3/#word-separator on the following text.

Ethiopic similarly has two styles of line-breaking, either only breaking at word separators

That will clarify that we're talking about both spaces and ethiopic word space characters.

frivoal commented 4 years ago

have a small suggestion: add a link

done: https://github.com/w3c/csswg-drafts/commit/1feb256f0d8f573d3582c612983c524186418f93

frivoal commented 4 years ago

@r12a, if you agree to file the break limits as a separate issue (against level 4), I think we're done here and can close. Can you confirm?