w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.46k stars 657 forks source link

[css-values-4][css-writing-modes-4] Revisit decision to use 永 instead of 水 as the ic unit #7577

Open ziyunfei opened 2 years ago

ziyunfei commented 2 years ago

Chrome just implemented the ic unit this week, when I shared this news in Chinese social media, some people are confused about the chosen character 水(water), especially those people who had the handwriting practice experience, they think 永(forever; eternal) is the right choice.

See this Twitter thread: https://twitter.com/intenttoship/status/1555274307735285760

chrishtr commented 2 years ago

I discussed this issue with @xiaochengh, @kojiishi, @wangxianzhu and one other expert. We concluded that there should be no significant compat or interop risk with changing the spec text to use 永 instead of 水, since in basically all CJK fonts the characters have exactly the same width and height.

Given that, and since 永 makes much more sense to Chinese speakers (including those I consulted), I recommend changing the spec to use that character.

Note: In the Chromium implementation, we plan to still use 水 for the moment, because of issues like those mentioned in this comment. But again, that is just an implementation detail and will have no difference in the dimensions observed by web developers, and should not be how it's described in the spec or developer documentation. (I just mention it because otherwise there might be a performance concern with the change. We could change the implementation easily in the future if font buckets change.)

nhnhwsnh commented 2 years ago

I think is better.

cdll commented 2 years ago

is not bad, but is better. 🎉

fantasai commented 2 years ago

@chrishtr I would rather not have the spec and the implementations randomly diverge, so we should only change the spec if implementations are also committed to change. This would not be a difficult change in the implementations, actually: most of the work is in updating the W3C Recommendation and writing proper tests for it, which @yisibl has volunteered to do. (Note that the tests need to be subtle enough to account properly for non-square CJK fonts and for proportional fonts, which some designers are starting to experiment with.) Updating a W3C Recommendation involves a lot of tracking across time as various requirements (testing, review, implementation) are fulfilled, so if @yisibl is willing to follow through on all stages of the spec update, that will help a lot.

But again, the entire point of having specs is to define implementations, and in fact the W3C Recommendation process requires demonstration of conforming implementations, so unless we're willing to update the implementations we cannot update the specs either.


I appreciate @myakura's analysis in https://github.com/w3c/csswg-drafts/issues/7577#issuecomment-1210082393 ; as @foolip notes in https://github.com/w3c/csswg-drafts/issues/7577#issuecomment-1210495600 the choice of 水 over 永 was intentional due to the frequency of usage, some practical effects of which we are seeing in the way font files are broken down.

There's also some subtle effects in the way characters are drawn and measured for optical alignment that mean 水 and 永 will yield different results in cases where we might need to measure the actual glyph ink area as a fallback for the ICFT baseline for fonts that don't have one specified. We should investigate this, and consider which is better. See @yisibl’s note about CSS Inline Layout L3.

@gongpeione wrt “using 永 as the ic unit makes more sense in Chinese and also looks more professional”: You will never actually see this character, except in the specifications. And unless you have a font where each character is drawn in a different size, you will not know which character is used in the implementations. We only measuring its width and height to find the size of a typical CJK character.

@sunhaitao Appreciate your comments thinking through the issues with each character. The reason we're not using U+3000 IDEOGRAPHIC SPACE is because in some proportional fonts, it does not match the full width of a Han character. Most CJK fonts are not proportional, and so measuring any character is the same; but some few are not, and we need to accommodate them properly. This is why we can't use just any character.

@CoelacanthusHex They are the same in practice in most cases, but we have to gracefully handle proportional fonts and degenerate situations such as incomplete font metrics...


I've rephrased the definition of the ic unit in the specs to emphasize the conceptual definition over the implementation method, it now reads:

ic unit Represents the typical advance measure of CJK letters, and measured as the used advance measure of the “水” (CJK water ideograph, U+6C34) glyph found in the font used to render it.

It was never the intention that the definition of which CJK character we measure would be in user-facing documentation, as the main idea is that we are providing a measure that approximates the measure of a typical CJK character. In most fonts, this should be equal to the measure of any CJK character.


Anyone commenting on this issue: there are etiquette guidelines to commenting in bug reporting systems, the first and foremost of which is, don't spam the system with duplicate reports or comments. In order for groups to work effectively in public, bug reporting systems need to efficiently collect and represent relevant information. If you want to +1 this issue, add a thumbs-up to the existing comment. It is only worth commenting if you have additional relevant information or analysis to add, or if you have a question that is not already answered here.

(Also, don't insult people. This is also against etiquette.)

I want to emphasize that our goal in making decisions here is to make things work as well as we can, in as many cases as we can. Each decision we make is for some functional reason. We should correct mistakes, but keep in mind: we don't want the specs to be beautiful, we want the web pages that browsers render when following them to be beautiful.

css-meeting-bot commented 2 years ago

The CSS Working Group just discussed Revisit ic character.

The full IRC log of that discussion <TabAtkins> Topic: Revisit ic character
<TabAtkins> github: https://github.com/w3c/csswg-drafts/issues/7577
<TabAtkins> astearns: So issue is whether the change the current "water" character used as the ic reference to the "eternal" character.
<TabAtkins> astearns: I propose we make this change, don't see a reason not to.
<TabAtkins> heycam: It generated a surprising amount of confusion.
<TabAtkins> heycam: Just because it looks so similar to the typical one used for calligraphy
<TabAtkins> [looking for Myles]
<TabAtkins> [reviewing elika's comments in the thread]
<TabAtkins> astearns: My take on her input is that it would be fine either way; we did have reasons for that particular character, but there aren't reasons not ot use the other one
<TabAtkins> TabAtkins: One reason in the thread is that the water character is prioritized higher in CJK split fonts, so it's more likely to have been loaded already, versus the eternal character
<TabAtkins> astearns: It was noted that this isn't necessarily problematic, fonts can change
<TabAtkins> TabAtkins: Sure, they could put any char there. But it's not there now.
<TabAtkins> heycam: I think we should change; I think it's unlikely we'll encounter situations where fonts have different glyph coverage that is an actual problem.
<TabAtkins> heycam: And without the change we'll likely get a slow trickle of people asking why we're using that character.
<TabAtkins> [fills in myles on context]
<TabAtkins> myles: I think approximately zero people will notice either way
<TabAtkins> astearns: A decent number of people seem to have noticed our current value and would welcome the change.
<TabAtkins> astearns: And the only technical argument against the change is the bit about it being more quickly loaded in many situations.
<TabAtkins> astearns: But it's been argued convincingly to me that it's an artifact of current font subsetting.
<TabAtkins> astearns: And something that could change.
<TabAtkins> [fills in fantasai]
<TabAtkins> fantasai: Depends.
<TabAtkins> fantasai: First, do people actually want to implement this?
<TabAtkins> fantasai: I heard someone say they want to change the spec but won't change their implementation; that's not going to fly.
<TabAtkins> astearns: Is it testable?
<TabAtkins> fantasai: yes
<TabAtkins> myles: you make a font with different characters
<TabAtkins> astearns: is it testable with current existing fonts?
<TabAtkins> heycam: in a subsetting situation and you forget a later one that has the eternal in it, but seems unlikely
<TabAtkins> myles: I think it's fine to change and see if problems arise
<TabAtkins> fantasai: Second reason is there is the ideographic character face top line and bottom line, which are two metrics we need for alignment of CJK characters
<TabAtkins> fantasai: If those metrics aren't in the font, we might ahve to measure them.
<TabAtkins> fantasai: So if you have to measure, is it better to measure water, eternal, or does it not matter?
<TabAtkins> fantasai: It will matter, since they have different heights
<TabAtkins> fantasai: WHich is better? We should ask Ken Lunde or someone at Adobe
<TabAtkins> fantasai: bc that's a practical implementation
<TabAtkins> fantasai: It would be great symbolically if we use eternal, but if it objectively gives us worse results we shoudln't use it
<TabAtkins> myles: what were the metrics?
<TabAtkins> fantasai: ICFT line
<TabAtkins> fantasai: [describes the ICFT]
<TabAtkins> fantasai: The characters are drawn in teh box, slightly inside the em square
<TabAtkins> astearns: For this issue, the character we use for ic extent doesn't necessarily specify what the UA would use for this purpose
<TabAtkins> fantasai: Right but if we used a different character for each usage, that's not amazing
<TabAtkins> astearns: My point is there might be an even differnt character better for that, as it's more close to the edges
<TabAtkins> astearns: So is one or other of these chars more or less likely to hit what the browser will use for this approx?
<TabAtkins> fantasai: I looked at a bunch to look at this, and a lot you'd think would be good are actually *smaller* than water
<TabAtkins> fantasai: Like the one that's just a box, due to optics it's actually smaller than the ink extent of water and eternal
<TabAtkins> fantasai: So I tried to find a char that used as much space as possible, and was reasonably frequent, that's how I got water
<drott> q+
<TabAtkins> fantasai: Dont' ahve a problem with eternal, just want ocnfirmation this won't give us worse results for othe rpurpose
<astearns> ack drott
<TabAtkins> drott: In trying to figure out this change, it seems like one of the concerns is people coming across the spec and taking issue with the char.
<TabAtkins> drott: So maybe being less specific and just saying "a representative character"
<TabAtkins> fantasai: We want interop, and dont' want people to pick a bad char without giving thought
<TabAtkins> fantasai: Same as for ch unit, we didn't say a representative char, we said 0
<TabAtkins> myles: Unsure how interop measuring ink is in reality because browsers can use diff points
<TabAtkins> myles: Also in OT, some points can have semantic meaning beyond just glyph bounds, which browsers would want to use
<TabAtkins> fantasai: Yeah point is just you don't want the CJK “one” character, for example, bc it's a horizontal line. And want every browser to use the same char.
<TabAtkins> fantasai: It's definitely not very much difference. But I want somebody who knows how they're typically drawn to confirm it won't be a regression.
<heycam> q+
<TabAtkins> myles: So 2 options are character that's representative of ic and the other metrics we're interested in, or use separate characters for each.
<TabAtkins> astearns: We're not speccing the char for the box dimensions if the info's not available in the font.
<TabAtkins> fantasai: We're using representative CJK char for ic unit, for text-combine-upright, and for ideographic-ink baselines.
<TabAtkins> fantasai: Better to use one char for all of these, rather than diff for each.
<TabAtkins> astearns: But right now we only specify ic, right?
<TabAtkins> fantasai: No, WM uses a char to specify width of tatechuyoko
<TabAtkins> myles: interested in knowing what browsers follow those specs right now
<TabAtkins> fantasai: If we want to change it bc people are mad, we can do that and I won't object, but I think it would be good to actually find out if this would be practically better or worse.
<TabAtkins> myles: This topic isn't urgent, so I don't think we need to push to resolve it right now if we want extra info.
<TabAtkins> astearns: I'll action myself to talk to Adobe fonts people about metrics, particularly when they're not in the font.
<TabAtkins> astearns: One clarifiction, fantasai asked explicitly about willing to implement
<drott> q+
<TabAtkins> astearns: myles you said we should try it and see?
<TabAtkins> myles: If the spec changes we're willing to try
<heycam> q-
<TabAtkins> astearns: So no change to the spec today, I'll see what info I can get, perhaps someone with Google Fonts connections?
<TabAtkins> drott: Yeah, we have some perf concerns of bracketing of CJK fonts.
<TabAtkins> drott: We haven't done a lot of investigation about this yet, so we'll have to look at this more closely.
<TabAtkins> drott: But we do currently expect water to be in the first bucket.
<TabAtkins> drott: And can see if Google Fonts is willing to make a change to keep eternal in the first bucket
<astearns> ack drott
<drott> s/bracketing/bucketing/
LIXiangChen commented 1 year ago

I'm not sure if it's too late, but I believe it's more reasonable to use U+3000 than "水" or "永".

I've read the comments above about why U+3000 is not used, but it's not convincing enough. Here are my thoughts after actually using the ic in live commercial projects.

1. Representative of the width

The comment above said: in proportional-width fonts, U+3000 does not match the width of Han characters. It is true, but please don't overlook that in proportional-width fonts, "水" or "永" also cannot match (or represent) the width of all Han characters.

The special status of "永" is that it contains the eight most basic stroke types of Han characters. This is for calligraphy and type design, but its metric width is not representative. Regardless of "水" or "永", their strokes are too few, in proportional-width fonts they are almost certainly narrower than characters with more complex strokes.

Let's look at U+3000. In ideographic equal-width fonts, it has the same width as Han characters; in proportional-width fonts, it has the most suitable width specified by the creator, which represents the creator's will (I know that in some older fonts, the width of U+3000 may not have been carefully set, but this is improving). It is conceivable that if there is an attribute called Typical ideographic metric width inside the font, its value will definitely same as the width of U+3000 — this is exactly the same as the purpose of the ic.

Also, the number of proportional-width CJK fonts is increasing, which is important.

2. Applicability

"水" or "永" only exists in fonts containing Han characters, that is Simplified Chinese, Traditional Chinese, and Japanese. The fonts containing U+3000 are wider, such as, the modern Korean font commonly does not contain Han characters, but contains U+3000. In this way, the scope of application of the ic will cover more scripts.

3. Font subsetting

ic is used for fonts containing Han characters, such fonts are large in size, so for non-local fonts, it is often pre-cut the font into slices, or download only the required characters' data, and dynamically build.

In such cases, to make ic work correctly, have to specifically load the reference character it needs. If it is U+3000, since it is a blank glyph, this is very easy to implement and is unlikely to cause unexpected consequences. But if it is "水" or "永", firstly, downloading their data will take extra traffic; secondly, if they are added to each slice, when used in combination, it may cause some weird conflicts; finally, for some special fonts that use special character sets, they may not contain "水" or "永", so if we add a blank "水" or "永" to the font, this may cause serious consequences (but for U+3000, we can do like this).