w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.52k stars 674 forks source link

[css-ruby] Handling apostrophes in pinyin #5997

Open frivoal opened 3 years ago

frivoal commented 3 years ago

In Chinese, pinyin syllables can be separated by spaces, but when part of a single compound word, they're more typically just juxtaposed with no separator. Except some of them: for disambiguation, certain combinations that would otherwise combine to form a single are separated by an apostrophe. E.g. "dong" and "xi", when next to each other, are just "dongxi", but "xi" and "an" are "xi'an", not "xian" (which is a single syllable with a different pronunciation).

We currently don't have anything in css-ruby that would let us automatically inject these apostrophes when needed. In a way, this is very language specific, and maybe we cannot solve it fully automatically. But it also depends on layout: whether the ruby of adjacent syllables lack sufficient space for visual separation or not. If we could find something generic enough, it would be nice to be able to handle such cases, even if it needed some amount of preprocessor / markup support.

Possibly, if compound words are marked up as single ruby segments, the apostrophes could go in the markup so that there would be no need for the layout engine to guess where they go, and so that if the annotation is rendered inline, it is correct. In that case, what we'd need in css is a way to make them disapear in the right circumstances.

xfq commented 3 years ago

I understand the "xian" / "xi'an" example, but I don't quite understand why/how we automatically inject these apostrophes. Could you provide some example code?

tabatkins commented 3 years ago

I think Florian is meaning that, if "xi" and "an" have ruby annotations wide enough that causes them to have some visual separation, they wouldn't need an apostrophe; if they had no ruby, or the ruby was small enough to not cause them to separate, they need the apostrophe.

frivoal commented 3 years ago

I don't have a solution for how. I am not even sure that we can find one. But if we can, it might be worth trying, because it is a problem. Here's some more details:

Say you have this markup:

<ruby><rb>西<rb>安<rt>xi<rt>an<rb>的<rt>de<rb>东<rb>西<rt>dong<rt>xi</ruby>

with default styling, it will look like this:

Screen Shot 2021-02-19 at 8 38 55

That's fine.

However, let's say we apply ruby-merge: merge to group the annotations per word, as afforded by this markup:

Screen Shot 2021-02-19 at 8 43 23

The "dongxi" over 东西 is fine, but the "xian" over 西安 is not. What we would want instead is something like this:

Screen Shot 2021-02-19 at 8 45 07

Here's another example. This is less realistic, but could happen too. Let's say we increase the font-size of the annotations:

Screen Shot 2021-02-19 at 8 40 34

That's not good. What we'd actually want is more something like:

Screen Shot 2021-02-19 at 8 41 48

Again, I don't think I know for sure how to solve it, but if someone can think of something, that would be good.

acli commented 3 years ago

I don’t think this is specific to pinyin. If Japanese could be annotated with romaji (let’s ignore how unlikely this is, for the sake of argument) we’d need this too.

I’ve seen Jyutping ruby annotations in the wild. This problem doesn’t affect Jyutping since it uses superscripted numerals for tones, but the fact that I’ve seen Jyutping annotations suggests we can’t discount the possibility that we’ll eventually need this for something other than pinyin.

Jeffxz commented 3 years ago

LOL, this is such a nice example. "Xian de dong xi" sounds like "something salty" (or "something fresh" depends on how you read it) in Chinese which is so different from "Xi'an de dong xi" (meaning "something from Xi'an"). Maybe the example of "xiandedongxi" is a different problem. But just look into "Xi'an". Aside of meaning, I think if we write ruby like this way <ruby><rb>西安</rb><rt>xi'an</rt></ruby><ruby>的</ruby><ruby><rb>东西</rb><rt>dongxi</rt></ruby> (https://codepen.io/jeff_xu/pen/mdOxjqw). It should be fine for Chinese. The only thing is display is not good for easily reading. I don't find any layout spec for Chinese specifically (see here https://www.w3.org/TR/clreq/) but it looks similar as group ruby in Japanese requirement here (https://www.w3.org/TR/jlreq/#positioning_of_groupruby_with_respect_to_base_characters).

I really like this example and consideration. But I kind of feel we might need definition about how to display pinyin with apostrophes in better way instead of considering inject apostrophes.

xfq commented 3 years ago

@frivoal I see. Thank you for your explanation!

xfq commented 3 years ago

I don't find any layout spec for Chinese specifically (see here https://www.w3.org/TR/clreq/) but it looks similar as group ruby in Japanese requirement here (https://www.w3.org/TR/jlreq/#positioning_of_groupruby_with_respect_to_base_characters).

@Jeffxz For Chinese, group-ruby is documented in § 3.3.4.3 Words as the Basic Units for Annotating Pronunciation, but it doesn't mention pinyin with apostrophes. We're tracking it in https://github.com/w3c/clreq/issues/351

But I kind of feel we might need definition about how to display pinyin with apostrophes in better way instead of considering inject apostrophes.

Ideally, it is best to solve both problems, i.e., not only makes the inserted apostrophes display better, but also automatically inject these apostrophes when needed.

heycam commented 3 years ago

Regardless of whether ruby-merge is separate or merged, it's possible for ruby annotation boxes to end up abutting, causing ambiguities, where it'd be wrong to introduce an apostrophe because that implies syllables being part of the one compound word. So I don't think automatic apostrophe introduction without any help from the author is a workable solution.

Is it acceptable for syllables to abut even when there is no ambiguity? It's not super readable. Is / should there be a way to require a minimum spacing between adjacent ruby annotation boxes? (Not sure if margin-inline is sufficient.)

xfq commented 3 years ago

Is it acceptable for syllables to abut even when there is no ambiguity? It's not super readable. Is / should there be a way to require a minimum spacing between adjacent ruby annotation boxes? (Not sure if margin-inline is sufficient.)

There were some related discussions in #3498. I think even if there is no ambiguity, it is easier to read if the syllables do not abut, so I think there should be a way to require a minimum spacing between adjacent ruby annotation boxes.

xfq commented 10 months ago

Similar problems might occur if you use romaji for Japanese. For example, "sinai" is しない[1], but "sin'ai" is しんあい[2], so when a vowel follows the 'n' sound, an apostrophe needs to be added after the 'n' sound.

Footnotes:

[1] Kanji: 竹刀,市内 [2] Kanji: 親愛