Closed r12a closed 3 years ago
To help the discussion move forward, here is my understanding of the current spec text.
Let's take an annotated a compound word in Japanese such as the following as an example:
<ruby><rb>思<rt>し<rb>春<rt>しゅん<rb>期<rt>き</ruby>
which could also be marked up as
<ruby><rb>思<rb>春<rb>期<rt>し<rt>しゅん<rt>き</ruby>
If you apply ruby-merge: separate
you'll get each base character aligned with each annotation. Notice how the base characters are slightly pushed apart so that there's no overlap.
If you apply ruby-merge: collapse
you get the following, which looks like 'group-ruby', except that you can break a line inside the element in a way you can't with group ruby, and the line break doesn't separate any annotations from their associated base characters.
Fantasai argues, iiuc, that this is particularly useful for an annotation such as
<ruby><rb>思<rb>春<rb>期<rt>shi<rt>shun<rt>ki</ruby>
which you might want to represent as a single word, ie.
If you apply ruby-merge: auto
the effect is undefined and down to the browser, however, you might end up with jukugo ruby (see https://www.w3.org/International/questions/qa-ruby#jukugo), in which case, if it follows the rules in jlreq/iso 4051, you'd see the following. Basically, jukugo allows overlapping to a maximum of one kana character so as to avoid separating the base characters, as you'd see with separate
.
I can see some usefulness in collapse
, though i'm still not sure whether it should allow breaks inside the compound word or not. It would be good to hear from the Japanese and particularly Chinese (pinyin) what is common practice.
I suspect that we should have a specific value jukugo
for people who specifically want that, rather than an unpredictable auto
value. Though i'm not sure what else auto
would do, given that we have collapse
already.
I see separate
as the 'take it back to the default' option, rather than something that will be often used. Is that correct?
It just occurred to me that ruby-merge could be helpful for producing inline annotations. If you have
<ruby><rb>思<rt>し<rb>春<rt>しゅん<rb>期<rt>き</ruby>
there's no way at the moment of producing
思春期 (ししゅんき)
but if you did
ruby { ruby-merge: collapse; ruby-position: inline; }
would that do it?
It would be good to hear from the Japanese and particularly Chinese (pinyin) what is common practice.
CLReq mentions the following in 3.3.1.1 / 2:
(The English text doesn't make much sense to me... The text above is based on the Chinese version of the text.)
there's no way at the moment of producing 思春期 (ししゅんき) but if you did
ruby { ruby-merge: collapse; ruby-position: inline; }
would that do it?
Moving order of content in this way looks hard to implement. And that's why we should use the CSS model rather than the current HTML model of ruby.
@upsuper yes, i read clreq. But what i'm wondering is whether, when pinyin for a whole word is written together for a chinese learner, and then that word runs past the right margin, does the word get split so that one part and its annotation stays on the first line, and the other moves to the second, with its bit of annotation? Or does the whole word get moved to the next line, keeping the pinyin annotation intact?
Moving order of content in this way looks hard to implement. And that's why we should use the CSS model rather than the current HTML model of ruby.
My assumption is that there is a fair amount of content out there already using the interleaved approach, and we should not present a solution that ignores that. What is it that makes this hard to implement? Don't you assemble a list of tuples for the ruby while parsing, and then simply generate the output by running one or another algorithm on that?
I think we need the collapsed
value. I can see a good use case for...
ruby { ruby-merge: collapse; }
ruby:hover { ruby-merge: separate; }
... or vice versa for easier copying and pasting of annotations.
However, I foresee problems with collapse
. Namely, it doesn't allow reordering of inter-language content into the correct order (e.g., Latin versus ideographic names). I think HTML will need to be modified to fix that problem since ruby content order isn't a styling issue.
@r12a:
But what i'm wondering is whether, when pinyin for a whole word is written together for a chinese learner, and then that word runs past the right margin, does the word get split so that one part and its annotation stays on the first line, and the other moves to the second, with its bit of annotation? Or does the whole word get moved to the next line, keeping the pinyin annotation intact?
You'd write markup to reflect the desired line-breaking behavior.
If you want to prevent line-breaking:
<ruby> ... ⁠<!-- word joiner --><rt/> ... </ruby>
(Of course, the above requires browsers to fully support U+2060 word joiner, which browsers like Firefox still don't despite the relevant bug (911849) being closed as "fixed". It looks like U+200D zero width joiner is interoperably implemented, however, despite not technically having the same meaning.)
My testing with English (Latin-script) base text in Firefox 50 indicates that using non-breaking spaces (
) in such a manner works as expected and prevents line-breaking.
Otherwise, ruby atoms should break across lines according to normal rules of the base text. If an annotation is collapsed, this should cause the broken collapsed annotation to separate.
I think a real question is whether or not collapsing annotations that are broken across lines should collapse once per line or all act as if separate
had been specified for the entire ruby sequence. (Imagine two annotation atoms on one line and three on the next line; there are two collapsing opportunities.)
Edit: Another question is if collapsing occurs once per line, should the collapsed text have some special type of alignment such as being aligned toward the line-break or centered if the break occurs on both sides in ruby broken over three or more lines.
The line-breaking issue is addressed in the spec: "When there is not enough space for an entire ruby container to fit on the line, the ruby may be broken wherever all levels simultaneously allow a break."
Emphasis mine. If there is no line-breaking opportunity between the annotations, then there can be no break. So if word is marked up row-wise as a single ruby segment, then the line-breaking rules can have discretion simply based on the text in the annotation. Latin characters stay together when adjacent, so that will control line-breaking at that point. If they are replaced with kana, then the break will be allowed.
The spec is less clear on how this applies in the interleaved case. Certainly whether there is an opportunity there depends on whether one exists in the base text: we can't break here if the base text can't break here. But since the annotations are interleaved, we can't tell if there's white space between them or not, as the white space outside the annotations all belongs to the base level. My conclusion would be that annotations should always allow a break between ruby segments, and if the author does not want one, s/he must suppress it with the white-space
property applied to the bases. (This would effectively require marking up each ruby in its own <ruby>
element so that white-space
can be applied.)
@r12a
The auto
value is the notional jukugo value, except that auto
is a good typical english word and that jukugo
isn't, so as a choice of a css keyword it wouldn't be great.
When it comes to the behavior, the way JLREQ or JIS X 4051 recommend Jukugo laid out is not the One True Way, just only one possibility (and I am not actualy sure that they're identical to each other). Simple Ruby recommends something different. All are acceptable as implementations of auto
, and which to use can be considered a quality of implementation question. (There are different subjective takes on what's best.) Individual authors might also have preferences as to which of the various jukugo layout methods they prefer, but offering and specifying them separately at this point would be overkill. In a future level, once everything else is very stable, I could imagine having further switches to get different styles of jukugo layout, but at this point, we're trying to get the whole thing to work at all, so requiring a specific algorithm would be too restrictive.
I think we should close with no normative change, but could certainly change example 10 into a Note pointing out that auto
value is for jukugo, and in addition to JLREQ add more references to what Jukugo is (like https://www.w3.org/International/questions/qa-ruby#jukugo) and how a UA might render it (such as Simple Ruby, JIS X 4051, etc.).
OK, we updated the note. See https://drafts.csswg.org/css-ruby-1/#collapsed-ruby (at some point tomorrow after it's rebuilt). Let us know if the spec looks good, or if further edits are necessary!
Ok, for me. I'll check with the i18n WG, since this is marked as a group comment.
https://drafts.csswg.org/css-ruby-1/#collapsed-ruby
[i'm creating a github issue for this so that (a) the old email thread is not lost, and (b) so that i can easily include pictures in the comments]
The original comment questioned
collapse
)auto
value should be a jukugo value, so that people can specify that they actually want jukugo behaviour as specified by jlreq, etcYou can follow the thread at http://www.w3.org/Mail/flatten/index?subject=i18n-ISSUE-359&list=www-style