w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.47k stars 658 forks source link

[css-ruby-1] Drop ruby-merge in favour of a specific jukugo value #784

Closed r12a closed 3 years ago

r12a commented 7 years ago

https://drafts.csswg.org/css-ruby-1/#collapsed-ruby

[i'm creating a github issue for this so that (a) the old email thread is not lost, and (b) so that i can easily include pictures in the comments]

The original comment questioned

  1. whether there's an actual requirement for group-like ruby that is broken across a line (collapse)
  2. whether the auto value should be a jukugo value, so that people can specify that they actually want jukugo behaviour as specified by jlreq, etc
  3. whether this level of the spec should have ruby-merge in it.

You can follow the thread at http://www.w3.org/Mail/flatten/index?subject=i18n-ISSUE-359&list=www-style

r12a commented 7 years ago

To help the discussion move forward, here is my understanding of the current spec text.

Let's take an annotated a compound word in Japanese such as the following as an example:

<ruby><rb>思<rt>し<rb>春<rt>しゅん<rb>期<rt>き</ruby>

which could also be marked up as

<ruby><rb>思<rb>春<rb>期<rt>し<rt>しゅん<rt>き</ruby>

If you apply ruby-merge: separate you'll get each base character aligned with each annotation. Notice how the base characters are slightly pushed apart so that there's no overlap.

separate

If you apply ruby-merge: collapse you get the following, which looks like 'group-ruby', except that you can break a line inside the element in a way you can't with group ruby, and the line break doesn't separate any annotations from their associated base characters.

merge

Fantasai argues, iiuc, that this is particularly useful for an annotation such as

<ruby><rb>思<rb>春<rb>期<rt>shi<rt>shun<rt>ki</ruby>

which you might want to represent as a single word, ie.

romaji

If you apply ruby-merge: auto the effect is undefined and down to the browser, however, you might end up with jukugo ruby (see https://www.w3.org/International/questions/qa-ruby#jukugo), in which case, if it follows the rules in jlreq/iso 4051, you'd see the following. Basically, jukugo allows overlapping to a maximum of one kana character so as to avoid separating the base characters, as you'd see with separate.

jukugo

r12a commented 7 years ago

I can see some usefulness in collapse, though i'm still not sure whether it should allow breaks inside the compound word or not. It would be good to hear from the Japanese and particularly Chinese (pinyin) what is common practice.

I suspect that we should have a specific value jukugo for people who specifically want that, rather than an unpredictable auto value. Though i'm not sure what else auto would do, given that we have collapse already.

I see separate as the 'take it back to the default' option, rather than something that will be often used. Is that correct?

r12a commented 7 years ago

It just occurred to me that ruby-merge could be helpful for producing inline annotations. If you have

<ruby><rb>思<rt>し<rb>春<rt>しゅん<rb>期<rt>き</ruby>

there's no way at the moment of producing

思春期 (ししゅんき)

but if you did

ruby { ruby-merge: collapse; ruby-position: inline; }

would that do it?

upsuper commented 7 years ago

It would be good to hear from the Japanese and particularly Chinese (pinyin) what is common practice.

CLReq mentions the following in 3.3.1.1 / 2:

(The English text doesn't make much sense to me... The text above is based on the Chinese version of the text.)

there's no way at the moment of producing 思春期 (ししゅんき) but if you did ruby { ruby-merge: collapse; ruby-position: inline; } would that do it?

Moving order of content in this way looks hard to implement. And that's why we should use the CSS model rather than the current HTML model of ruby.

r12a commented 7 years ago

@upsuper yes, i read clreq. But what i'm wondering is whether, when pinyin for a whole word is written together for a chinese learner, and then that word runs past the right margin, does the word get split so that one part and its annotation stays on the first line, and the other moves to the second, with its bit of annotation? Or does the whole word get moved to the next line, keeping the pinyin annotation intact?

r12a commented 7 years ago

Moving order of content in this way looks hard to implement. And that's why we should use the CSS model rather than the current HTML model of ruby.

My assumption is that there is a fair amount of content out there already using the interleaved approach, and we should not present a solution that ignores that. What is it that makes this hard to implement? Don't you assemble a list of tuples for the ruby while parsing, and then simply generate the output by running one or another algorithm on that?

patrickdark commented 7 years ago

I think we need the collapsed value. I can see a good use case for...

ruby { ruby-merge: collapse; }
ruby:hover { ruby-merge: separate; }

... or vice versa for easier copying and pasting of annotations.

However, I foresee problems with collapse. Namely, it doesn't allow reordering of inter-language content into the correct order (e.g., Latin versus ideographic names). I think HTML will need to be modified to fix that problem since ruby content order isn't a styling issue.

@r12a:

But what i'm wondering is whether, when pinyin for a whole word is written together for a chinese learner, and then that word runs past the right margin, does the word get split so that one part and its annotation stays on the first line, and the other moves to the second, with its bit of annotation? Or does the whole word get moved to the next line, keeping the pinyin annotation intact?

You'd write markup to reflect the desired line-breaking behavior.

If you want to prevent line-breaking:

<ruby> ... &#x2060;<!-- word joiner --><rt/> ... </ruby>

(Of course, the above requires browsers to fully support U+2060 word joiner, which browsers like Firefox still don't despite the relevant bug (911849) being closed as "fixed". It looks like U+200D zero width joiner is interoperably implemented, however, despite not technically having the same meaning.)

My testing with English (Latin-script) base text in Firefox 50 indicates that using non-breaking spaces (&nbsp;) in such a manner works as expected and prevents line-breaking.

Otherwise, ruby atoms should break across lines according to normal rules of the base text. If an annotation is collapsed, this should cause the broken collapsed annotation to separate.

I think a real question is whether or not collapsing annotations that are broken across lines should collapse once per line or all act as if separate had been specified for the entire ruby sequence. (Imagine two annotation atoms on one line and three on the next line; there are two collapsing opportunities.)

Edit: Another question is if collapsing occurs once per line, should the collapsed text have some special type of alignment such as being aligned toward the line-break or centered if the break occurs on both sides in ruby broken over three or more lines.

fantasai commented 7 years ago

The line-breaking issue is addressed in the spec: "When there is not enough space for an entire ruby container to fit on the line, the ruby may be broken wherever all levels simultaneously allow a break."

Emphasis mine. If there is no line-breaking opportunity between the annotations, then there can be no break. So if word is marked up row-wise as a single ruby segment, then the line-breaking rules can have discretion simply based on the text in the annotation. Latin characters stay together when adjacent, so that will control line-breaking at that point. If they are replaced with kana, then the break will be allowed.

The spec is less clear on how this applies in the interleaved case. Certainly whether there is an opportunity there depends on whether one exists in the base text: we can't break here if the base text can't break here. But since the annotations are interleaved, we can't tell if there's white space between them or not, as the white space outside the annotations all belongs to the base level. My conclusion would be that annotations should always allow a break between ruby segments, and if the author does not want one, s/he must suppress it with the white-space property applied to the bases. (This would effectively require marking up each ruby in its own <ruby> element so that white-space can be applied.)

frivoal commented 3 years ago

@r12a The auto value is the notional jukugo value, except that auto is a good typical english word and that jukugo isn't, so as a choice of a css keyword it wouldn't be great.

When it comes to the behavior, the way JLREQ or JIS X 4051 recommend Jukugo laid out is not the One True Way, just only one possibility (and I am not actualy sure that they're identical to each other). Simple Ruby recommends something different. All are acceptable as implementations of auto, and which to use can be considered a quality of implementation question. (There are different subjective takes on what's best.) Individual authors might also have preferences as to which of the various jukugo layout methods they prefer, but offering and specifying them separately at this point would be overkill. In a future level, once everything else is very stable, I could imagine having further switches to get different styles of jukugo layout, but at this point, we're trying to get the whole thing to work at all, so requiring a specific algorithm would be too restrictive.

I think we should close with no normative change, but could certainly change example 10 into a Note pointing out that auto value is for jukugo, and in addition to JLREQ add more references to what Jukugo is (like https://www.w3.org/International/questions/qa-ruby#jukugo) and how a UA might render it (such as Simple Ruby, JIS X 4051, etc.).

fantasai commented 3 years ago

OK, we updated the note. See https://drafts.csswg.org/css-ruby-1/#collapsed-ruby (at some point tomorrow after it's rebuilt). Let us know if the spec looks good, or if further edits are necessary!

r12a commented 3 years ago

Ok, for me. I'll check with the i18n WG, since this is marked as a group comment.