w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.52k stars 672 forks source link

[css-writing-modes] Support rtl Chinese #2754

Open Zhang-Junzhi opened 6 years ago

Zhang-Junzhi commented 6 years ago

Traditionally, CJK text writes and reads vertically, but sometimes it is preferred to write horizontally. Especially when the text is used for some title-line kind things, such as a park name, a shop name, etc.

But traditional horizontal CJK text is different from horizontal-tb, it reads from right to left.

So I suggest adding a new value horizontal-tb-rl to writing-mode.

Note that it doesn't mean that horizontal-tb will only be used for LTR text from now on. horizontal-tb can still be both LTR and RTL. Those languages which can ONLY be written right to left (such as Arabic, Hebrew) will always be written from right to left no matter whether it's in horizontal-tb or horizontal-tb-rl.

Zhang-Junzhi commented 6 years ago

Here is one example. This is the plaque of a traditional Chinese temple.

Look at the slogan part of the plaque, the slogan is written from right to left, So actually they reads 澤施四海 in the Latin habit, not 海四施澤

123.jpg

upsuper commented 6 years ago

I think this should be considered as a special case for vertical-rl where there is only one character for each line.

Zhang-Junzhi commented 6 years ago

@upsuper I think they are different in that the whitespace gaps between characters for horizontal-tb-rl is the font's glyph advtanages, while for vertical-rl, they are line heights.

frivoal commented 6 years ago

Does this mode of writing use the vertical alternates? My experience of it is insufficient to answer, as I cannot recall seeing any content like this that used characters that are different in horizontal vs vertical writing modes.

If it is using horizontal alternates, I wonder if this is a case where we should actually use the direction property (https://drafts.csswg.org/css-writing-modes-3/#direction), despite the warning against it in the spec. I don't think it belongs in the markup since it is a stylistic choice, not something inherent to the language.

Another other interesting questions to help figure out whether this is indeed horizontal right to left or vertical-rl with very short columns:

What happens when numbers in the latin script's numerals are embedded in it? I believe that happens for example in newspaper titles with dates, and that the dates are displayed in their usual ltr. This would confirm that this is a horizontal mode, and that the direction property can handle it.

Zhang-Junzhi commented 6 years ago

@frivoal Thanks for the point. I finally managed to achieve the effect with the following code:

<!DOCTYPE html>
<meta charset="utf-8">
<title>rtl horizontal text</title>
<span style="direction: rtl; unicode-bidi: bidi-override;">澤施四海</span>

The direction combined with unicode-bidi does the job prefectly, so it's not necessary to add a new value to the writing-mode, I think this issue can be closed now. edited Later: Not really prefect solution, @r12a provides a non-CJK mixed example in the next comment, where this solution requires embedded markups, which does not looks pretty.

Having said that, as to the topic whether it's in horizontal mode or short-columned vertical mode:

As a Chinese, I have never experienced in my life any senario where traditionally horizontal CJK characters are mixed with non-CJK characters. But I believe usages like this are meant in horizontal mode instead of short-columned vertical mode. Characters are placed to the next line typically ONLY when there's no room in the inline direction so they have to have a new line. But I have seen RTL horizontal usages in senarios where there's still much room in the vertical direction. So you can just imagine how incredible it is to put each English word in one line when there's still much room in the horizontal direction. So for this reason, I believe usages like this are meant in horizontal mode. Although they can be used to address insufficient room issues in the vertical direction, that doesn't necessarily mean they're in vertical mode.

r12a commented 6 years ago

This Taiwanese newspaper contains various items in RTL Chinese, and not only headings – see the caption under the photo.

Note also that it's actually bidirectional text: eg. the 8.6% flows LTR within the RTL flow, just as in Arabic/Hebrew/etc.

The normally recommended approach for this is indeed to use an override. However, take the case of 8.6%受訪者没頭路: since the text should be always stored in logical order (so that a machine knows this is 8.6%, not 6.8%, and can read the word 'respondents' 受訪者), it's important to not reorder the digits or the text in memory. For that reason, you need to apply directionality separately to the number and the text. In HTML the best way to do this is probably via markup such as the following:

<h2><bdo dir="rtl"><bdo dir="ltr">8.6%</bdo>受訪者没頭路</bdo></h2>

Note also that the one-character per column approach is really not at all helpful when dealing with the caption. One might try to argue that there is some column related semantics in the case of simple headings, since they span vertical-rl text, but actually there are plenty of examples of RTL chinese and (pre-1945) japanese in contexts that have nothing to do with vertical text.

If we were to develop some new property/value for this, i'm thinking it would be better to have something that resets the default direction at the character level to RTL for han, kana, and other defined sets of characters. Then we'd have a situation where bdo wasn't necessary, and where alignment of the text could also be determined automatically (whereas for these examples you'd need to explicitly right-align the lines). Such an approach would also be useful for other scripts, such as Tifinagh, Egyptian Hieroglyphs, Runes, etc. which are also LTR characters by default, but sometimes are assembled in RTL lines.

Zhang-Junzhi commented 5 years ago

I just submitted an improved proposal for this issue. #3608

fantasai commented 5 years ago

@Zhang-Junzhi Issues should be primarily tagged to problems, not solutions.

Zhang-Junzhi commented 5 years ago

Here are my two cents on favouring adding support of RTL Chinese:

As a Chinese, I can confirm there are reasonably common use cases in horizontal RTL Chinese.

First, in present-time use, plenty of traditional-style title-line things, such as park names, shop names, temple names, are read still horizontally RTL, like I said in the original post.

Second, like @macnmm said horizontal RTL Japanese can also be seen in vehicles, the horizontal RTL Chinese scripts also often appear in vehicles in China as well.

Third, although horizontal RTL Chinese titles are hardly seen in newspaper in mainland China nowadays, they are still often seen in Taiwan Chinese newspaper and books, and as do Japanese newspaper and books.

Fourth, horizontal RTL Chinese scripts are also common practices in Chinese Calligraphy field in present-time China, Japan and Korean, where majority people are in favour of RTL scripts, or usually encouraged to write RTL instead of LTR.

timg

591658f620fc4e6bb3bbc9b578f16d1f

001v7djity6nxc1vzxd87 690

(Photos showing people practicing Chinese Calligraphy with RTL scripts)

Fifth, the original horizontal Chinese scripts are almost always RTL, the LTR Chinese is primarily due to infulence of western scripts. Since CSS even supports ancient scripts(like vertical-lr Mongolian), it makes senses to me to also support RTL CJK.

frivoal commented 5 years ago

I'm not opposed to the idea of supporting horizontal RTL for Chinese and Japanese, but:

they are still often seen in [...] Japanese newspaper and books.

Vertical writing is commonly seen in those in Japan, but horizontal RTL is not. Maybe it happens occasionally, but it isn't common, and I don't recall seeing it.

First, in present-time use, plenty of traditional-style title-line things, such as park names, shop names, temple names, are read still horizontally RTL

These generally are names / single words / short phrases (without embedded numbers, mixed scripts, punctuation, etc), which seem to be adequately covered by unicode-bidi: bidi-override. For example, you shared such a sign in https://github.com/w3c/csswg-drafts/issues/2754#issuecomment-396035239. As you mentioned in https://github.com/w3c/csswg-drafts/issues/2754#issuecomment-396057632, it can be achieved with today's html/css. Like here: https://jsbin.com/mebuyew/edit?html,output

That should cover it for truck-side signs as well, and probably also for calligraphic exercises.

With all that said, I think the Taiwanese newspaper use case is perfectly reasonable and justified, and for that, unicode-bidi: bidi-override does not seem adequate.

So, I'm not saying we don't need to do this, just that we need to be sure what we're doing it for, so that we can figure out: 1) how pressing the need is 2) where to find lots of examples to check if we're getting the details right (e.g. interaction with logical properties, punctuation, bidi interraction with latin text or numbers, or arabic text).


PS:

Since CSS even supports ancient scripts(like vertical-lr Mongolian)

Mongolian script is in current use, so describing it as an ancient script is inaccurate.

r12a commented 5 years ago

I believe that no-one is disagreeing with the suggestion that some texts in Chinese and Japanese run, or used to run, RTL in horizontal text. I want to move on to discuss the proposal that was made in the opening comment of this issue for handling that.

The proposal was to create a new writing-mode property. I don't think that is appropriate. In CSS, writing-modes describe the direction in which lines are arranged and sequenced - not the direction of the text on those lines. In fact, CSS intentionally distanced itself from older models that tried to munge the two together (eg. SVG, XSL). Proposing a new writing-mode property, in my mind, is shooting arrows at the wrong target.

For the use cases described above the writing-mode direction is still horizontal-tb. What's needed is a way to change the arrangment of characters within a line during display.

The way that's normally done is to use the dir attribute in HTML or the direction property in CSS elsewhere, but that approach relies on the directional properties of the characters involved in a way that doesn't hold here.

I made a counter-proposal that says use overrides (with nesting where necessary to change direction). I'm not yet convinced that anything else is needed. Note also that this is very similar to the advice we give authors using Arabic, Hebrew, etc - ie. tightly wrap every opposite-direction phrase in markup, and use the dir attribute on that markup. Be sure to nest markup to show the structure.

It's true that override markup or styling doesn't inherit well (i never understood why that was always treated differently from normal direction - or for that matter, why we need an inline bdo element rather than an rlo/lro attribute on an element, but i digress...), but i think that an override approach should probably suffice for the amount of text that needs this special treatment. In modern use it seems to be mostly short runs of text alongside vertically aligned content (eg. for titles or captions shown above). For archaic uses, it is probably only needed for short expository texts too, i suspect.

By the way...

Also, let's bear in mind that this is not only about Chinese and Japanese. Many archaic scripts could be written RTL in horizontal lines. And also let's bear in mind that RTL directionality is often not all that's need. Often scripts that ran RTL but who's modern Unicode properties are LTR also tended to mirror the characters at the same time, eg, Egyptian hieroglyphs, Tifinagh berber script, and Old Norse runes. (You can do this using CSS. See an example here https://r12a.github.io/scripts/tifinagh/index#dir. Or go to my pickers for any of those scripts, click on the + top left of the text area, then click on the ⭅︎ at the bottom right of the text area to see the effect.)

So, in summary, a new writing-mode value is the wrong solution. Applying overrides seems like an adequate one for the use cases involved.

Zhang-Junzhi commented 5 years ago

Thanks for the reply. @r12a

Applying additional markups doesn't seem pretty, although people usually can live with it, if no other better solution is available.

But something can even not be achieved by applying additional markups. Please see #3608 (comment) about punctuations.

FYI, I just tested the conjecture and it unfortunately turned out to be true. Parentheses mirror their shapes, but periods and commas unfortunately "remain in LTR shape". And commas seem reasonable use cases in RTL CJ. So that's an unsolved issue.

Zhang-Junzhi commented 5 years ago

TBC, by talking about punctuation issue, I am not necessarily saying this issue is to be fixed in CSS level. The issue seems to me could be that punctuations like commas, periods lack native Unicode LTR/RTL property in Unicode level(or/and maybe in font level). But I am not an expert, I just talk about the issue I just saw, not specific solution.

r12a commented 5 years ago

If you look at the caption in the picture of the Taiwanese newspaper above you'll see that commas look the same in that (printed) RTL text as in LTR text. (Also note that all commas in Hebrew look the same as English commas - no mirroring.) Do you have evidence to suggest that this is not the normal approach, or are you just guessing?

frivoal commented 5 years ago

If you look at the caption in the picture of the Taiwanese newspaper above you'll see that commas look the same in that (printed) RTL text as in LTR text. (Also note that all commas in Hebrew look the same as English commas - no mirroring.) Do you have evidence to suggest that this is not the normal approach, or are you just guessing?

I have no evidence, and that doesn't bother me at all for Chinese, because the punctuation marks are centered. For Hebrew, the space next to the comma gets bidi-reordered as well, to there's no problem either. But Japanese commas and period include a blank right half, and that blank feels very weird to me if writing in RTL. When switching to a vertical text, the blank upper half is removed and replaced by a blank lower half to keep the comma / period next to the preceeding phrase and away from the following phrase. Independently of the shape of the punctuation mark, I'd expect something similar to happen to the spacing in RTL mode.

That said, that's just my expectation, and I have no material to look at to judge whether that is correct. Or maybe it's irrelevant, because it just never happens, and RTL Japanese is either old (before punctuation was a thing), or short phrases on signs, where punctuation isn't used.

r12a commented 5 years ago

Yep, i was wondering those things too. It could also be that the tendency to leave the gap to the right in Japanese punctuation is a new thing, too. The idea that RTL text in Japanese is and always was only for short pieces of text is supported by the article at https://www.sljfaq.org/afaq/right-to-left.html.

According to that article, more serious use of RTL text was a short-lived, failed experiment.

At the very beginning of the change to yokogaki, in the Meiji period (1868-1912), there was a short-lived form called migi yokogaki (右横書き), "right yokogaki", in contrast to hidari yokogaki (左横書き), "left yokogaki", the current form. This resembled the right-to-left horizontal writing style of languages such as Arabic or Hebrew with line breaks on the left hand side of the page. It was probably based on the traditional single-column right-to-left writing. This form was never widely used, and has not survived.

r12a commented 5 years ago

[again, with link] Here's an example (with multiline text!). I have no idea where this comes from - whether a real example, or something made up. https://upload.wikimedia.org/wikipedia/commons/f/f6/RIKEN_VITAMIN.png

Is that a period on the very top line, or just a centre dot? The only punctuation in the multline block, bottom right, is an interestingly tilted exclamation mark. I wonder whether the Latin text is AD or DA when pronounced. I notice that 'vitamin' is written LTR on the actual bottle.

frivoal commented 5 years ago

Is that a period on the very top line, or just a centre dot?

I'd guess a center dot, that makes more sense as a separator of two phrases which aren't sentences. Also, if it was a period, there ought to be one at the end of the second phrase, but there isn't. Interestingly, this one is right aligned within the space it is given, while today the center dot would be centered horizontally and vertically.

The only punctuation in the multline block, bottom right, is an interestingly tilted exclamation mark.

Right. That doesn't tell us anything about spacing, except maybe that they're avoiding needing to cope with spacing, but that's too little data to even conclude that.

I wonder whether the Latin text is AD or DA when pronounced.

But as far as I know, neither Vitamin AD nor Vitamin DA is a thing, so they probably mean A & D. Since the Vitamin A is what they're supposed to have started the company with (https://www.rike-vita.jp/int/com/history.html) and their flagship product, I suspect they mean for the A to come first, and the word to be read "Vitamin AD". Meaning that they're not doing bidi, but are bdoing latin text into rtl as well.

aaaxx commented 5 years ago

At the risk of stating the obvious, the problem with relying on advertising and packaging copy is that it often employs custom detailing for the sake of better graphic effect at the expense of established orthographic rules.

himorin commented 5 years ago

Seeking various (yes, various....) old Japanese newspapers and magazines, I think I found two samples which has Japanese comma in horizontal RTL. Both seems to use opposite spacing for comma (red circled in images).

image source: https://twitter.com/aya_nyw19/status/782203672109289472 (advertizements in magazine "Weekly Asahi" at 1939 Showa-era)

image source: https://imgur.com/r/newsokunomoral/Vvyzk (advertizements in magazine "Comrades" at 1921 Taisho-era)

As in article @r12a pointed (one at sljfaq.org), originally Japanese horizontal text was

a special form of tategaki, with one-character columns going from right to left

(so every vertical line are line-breaked by one character) and normally used only for signboards or headings (of e.g. newspaper), so I suppose usually no comma nor period is included... In an article on punctuation by ministry of education at 1906 (http://dl.ndl.go.jp/info:ndljp/pid/903921), no definition seems made for horizontal. Interestingly, as in page 11 (9th photo), bullet was not in the center, so what we saw in sample as de-centered dot or something could be actually a dot but not at the center of full boundary. Newer article on punctuation by ministry of education at 1946 has 4 punctuation definition for horizontal, but it uses English like comma (",") but not one like for vertical ("、").

frivoal commented 5 years ago

In a meeting of Japanese typography experts today at the APL, I raised this topic. People were aware that RTL horizontal Japanese for text other than names on signs used to exist, but all agreed that it has completely disappeared in modern use, and no-one could actually recall seeing any instance of it in person (other than the samples in this thread). Even back when it was used, the practice was short lived. As far as I can tell, modern practice is limited to Taiwan, and so the design of the feature should be driven by Taiwanese needs. For example, the question of the spacing around Japanese punctuation marks I raised earlier, and for which @himorin has found nice examples, while interesting in theory, is not relevant to modern practitioners. Since it is not relevant for Taiwan, we probably shouldn't let concerns about it get in the way of solving the actual problem.

himorin commented 5 years ago

As @frivoal wrote, horizontal RTL in Japanese is historical and almost totally not used in Japan (car body sample should be considered as design/display matter - just placing character from the front of car on both side, one should be RTL and another is LTR as normal), of course we need them when we reproduct old documents as E-PUB or something but could be rare. For other points, 1) different shape of punctuation, my two samples show the same which has space at left half not like currently used for LTR. Even at these age, LTR sample at 1926 (http://dl.ndl.go.jp/info:ndljp/pid/931740) uses the same comma as now (having space in right half side). So there could be a requirement to change shapes of punctuation. 2) direction of A-Z characters in RTL, sample from @r12a on Vitamin, I found in vertical mode as "AD", so that was not directions mixed: http://instadayz.com/tag/%E7%90%86%E7%A0%94%E3%83%93%E3%82%BF%E3%83%9F%E3%83%B3%E7%90%83 and it seems natural for Japanese to say alphabetical order for Vitamin (also same for my first image). There could be quite few sample for numerical numbers, since "Kansuuji" (like "一二三") shall be the default way to write numbers at these days... Some sample from jaa2100 shows mix of English (RTL) and Japanese (LTR) in different lines (could not find any sample for two in one line...): Christmas program, cream, news paper, cover page. In the last sample, interestingly, ninutes of bars are shown in numerical numbers like "60分" but in RTL (not "分06" nor "分60").

r12a commented 5 years ago

FYI. I have a long-standing action from the i18n WG to write an article about how to make LTR scripts run RTL. I have just started work on that.

Zhang-Junzhi commented 5 years ago

Thanks. It will be a nice work.

Zhang-Junzhi commented 5 years ago

@frivoal I wonder if it would be nice for the CSS writing mode to also give an example of use of bidi-override for horizontal RTL CJ. Right now, the spec just simply discourages use of bidi-override.

faciens commented 1 year ago

Possibly relevant: I stumbled across an English book from 1801 that has inline RtL Chinese https://books.google.de/books?id=sDRMAAAAcAAJ&pg=PR25#v=onepage&q&f=false

For example:

Bildschirm­foto 2022-12-06 um 02 51 43

The name of the person mentioned is 乾隆 but written 隆乾 in the book. And 琉球 is written as 球琉 (and the Latinisations are left-to-right in word order)

kokoshneta commented 11 months ago

[again, with link] Here's an example (with multiline text!). I have no idea where this comes from - whether a real example, or something made up. https://upload.wikimedia.org/wikipedia/commons/f/f6/RIKEN_VITAMIN.png

Is that a period on the very top line, or just a centre dot? The only punctuation in the multline block, bottom right, is an interestingly tilted exclamation mark. I wonder whether the Latin text is AD or DA when pronounced. I notice that 'vitamin' is written LTR on the actual bottle.

The conversation has long since moved on from this topic, but just for the sake of completeness, no one seems to have noticed that there is actually a full stop in the Riken Vitamin image as well – not in the line at the top (or the address line at the very bottom), but in the longer bit of text over the girl’s skirt. The second-to-last line contains “良好にして。” with a full stop spaced towards the preceding kana (て).

r12a commented 2 months ago

Btw, the article i alluded to just above was published in 2022. You can find it at https://www.w3.org/International/questions/qa-ltr-scripts-in-rtl.en.html