w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 661 forks source link

[css-text-4] Add 'text-spacing: trim-all'? #8482

Closed fantasai closed 1 year ago

fantasai commented 1 year ago

See this comment from @MurakamiShinyu.

The alternative would be to ask Unicode to add half-width variants of commas and middle dots for usage within numbers? :) :( T_T

MurakamiShinyu commented 1 year ago

If font-kerning: all (https://github.com/w3c/csswg-drafts/issues/6723#issuecomment-1411487571) is available, it will cover most cases of text-spacing: trim-all because palt/vpal changes the advance width of full width punctuations to half width.

So I don't think text-spacing: trim-all is highly necessary, but it might be nice to have because:

MurakamiShinyu commented 1 year ago

I noticed that the 'text-spacing: trim-all' might be useful especially for Korean text. It may resolve the problem, “the spacing of adjacent fullwidth and non-fullwidth punctuations looks not very nice” (see https://github.com/w3c/csswg-drafts/issues/6091#issuecomment-1454675193), without adding non-fullwidth handling to text-spacing.

Windows and Mac's default Korean fonts, "Malgun Gothic" and "Apple SD Gothic Neo", have narrow (proportional) width glyphs for fullwidth (East_Asian_Width=Wide) punctuations, such as "《 》". So authors treat these punctuations as proportional and add space (U+0020) for necessary spacing, e.g., between "," and "《" in the following example:

… 생전에 '영국 최고의 극작가' 지위에 올랐다. 《로미오와 줄리엣》, 《햄릿》처럼 인간 내면을 통찰한 걸작을 남겼으며, …

(from an Wikipedia Korean article)

This looks good if such fonts are used (Windows and Mac's default fonts are ok), but not very good with other fonts (e.g., with Android default fonts). The 'text-spacing: trim-all' may be used to solve this problem.

'text-spacing: trim-all' will be useful also for Japanese text, when authors want to use proportional punctuation (with font-kerning: all, or font-feature-settings: 'palt') but available fonts may not have that feature. 'text-spacing: trim-all' may be used as a reasonable fallback.

fantasai commented 1 year ago

@xfq pointed at Ken Lunde's Unicode proposals for encoding this information using Variation Selectors: https://www.unicode.org/L2/L2017/17056-sv-western-vs-eastasian.pdf

So we could ask for this proposal to be extended to handle this problem. We might still need text-trim: all, but at least for things like the numeric separator usage, it would be best if the information could be encoded into the text stream since that's a content distinction, not a styling one.

(Apparently some Adobe fonts already implement the proposals... Which is unfortunate because the VS1/VS2 distinction isn't consistent throughout the proposal, and otherwise it would be more straightforward and consistent to extend the pattern in the earlier table to cover stops and commas.)

r12a commented 1 year ago

I believe Ken's proposal is a workaround for plain text environments; i think we should allow authors to use CSS to fix this problem on the Web – it significantly reduces authoring and maintenance hassle (made worse because VS code points are invisible) but also provides flexibility to alter things easily for ranges of text or whole documents.

fantasai commented 1 year ago

@r12a That helps for styling, but doesn't help for the contextual use of narrower punctuation e.g. in the middle of a number. It's significantly more work to style that correctly than to indicate it in the character stream.

frivoal commented 1 year ago

I agree with @fantasai . But also, it's not just question of whether it's more work. It's also fundamentally a semantic difference, and relying on CSS to express it seems off.

css-meeting-bot commented 1 year ago

The CSS Working Group just discussed text-spacing: trim-all, and agreed to the following:

The full IRC log of that discussion <fantasai> Topic: text-spacing: trim-all
<fantasai> github: https://github.com/w3c/csswg-drafts/issues/8482
<fantasai> -> https://github.com/w3c/csswg-drafts/issues/4246#issuecomment-1416647486
<TabAtkins> fantasai: Think it woujld help to look at, there's a comment here
<TabAtkins> fantasai: If you look at these pics, you'll see there, typically CJK punctuation like commas, brackets, are full-width, like normal charcters
<TabAtkins> fantasai: Half of it is glyph and half is blank
<TabAtkins> fantasai: Sometimes people want to remove that spacing
<TabAtkins> fantasai: Sometimes stylistic, sometimes to convey inclusion (comment in a list of numbers)
<TabAtkins> fantasai: We have text-space-trim property that allows trimming half spaces in certain contextual cases where it looks bad to keep
<TabAtkins> fantasai: For example, at the beginning of a line you don't want to start with a half-empty glyph, you want the opening bracket to start flush
<TabAtkins> fantasai: Or if you have two closing parens, the inner shoudln't have the extra space after it
<TabAtkins> fantasai: So we have a bunch of values to control these things. But they remove the space conditionally, based on position of the glyph in the line or wrt other characters
<TabAtkins> fantasai: But5 that doesn't solve these problems, where we want to remove the extra space unconditionally
<TabAtkins> fantasai: There are two reasons. One is stylistic, we shoudl handle that in CSS
<TabAtkins> fantasai: the other is semantic, unicode should handle that
<TabAtkins> fantasai: Proposal is a trim-all value, which removes all these extra half spaces
<TabAtkins> fantasai: You *can* use it to hack things around, like putting spans around numbers and rmeoving spaces within each. Not ideal but possibl.
<florian> q+
<astearns> ack florian
<TabAtkins> fantasai: So should we add a trim-all that unconditionally trims full-width down to the half-width of actual glyph?
<TabAtkins> florian: I thought I knew this already but I'm confused
<TabAtkins> florian: When you say sometimes stylistic, sometimes semantic
<astearns> q+ to ask how this interacts with text-spacing-trim: trim-auto
<TabAtkins> florian: For stylistic we have contextual trim - do we have a stylistic need for trim all?
<TabAtkins> florian: I think you said so but I didn't realize that was the case
<TabAtkins> fantasai: Murakami-san posted some examples of it
<TabAtkins> florian: Which is stylistic?
<TabAtkins> fantasai: The first one
<TabAtkins> florian: Yeah that's true
<TabAtkins> florian: Okay in that case I think we should add it
<TabAtkins> florian: I was unsure because, for semantic reasons this shoudl be solved in Unicode, or else in HTML. If it's in HTML it's *reasonable* to have a CSS property explaining the behavior change, but if it's Unicode there's no need to invoke CSS at all.
<TabAtkins> florian: But if we need it for stylistic reasons *anyway*, then it does make sense to have it in CSS.
<TabAtkins> fantasai: A related question might be, do we need to distinguish between brackets and.. I'll call them pauses - commas and periods and whatnot
<astearns> https://drafts.csswg.org/css-text-4/#text-spacing-trim-property
<TabAtkins> astearns: How is this different than trim-auto value already in there?
<TabAtkins> florian: That's context-dependent, it doesn't remove them always.
<astearns> ack a
<Zakim> astearns, you wanted to ask how this interacts with text-spacing-trim: trim-auto
<TabAtkins> florian: In particular, it wouldn't do it for the comma in the middle of a number
<TabAtkins> florian: Or around angle brackets in example 3.1.2 that Murakami-san showed
<TabAtkins> florian: So if you want to force trim regardless of context we need aqnother value
<chrishtr> q+
<TabAtkins> fantasai: So proposed resolution is to add a trim-all, and I also want to ask if we want to ask the i18n WG if we need to distinguish between "trim brackets" and "trim pauses"
<TabAtkins> chrishtr: So trim-all removes all extra whitespace regardless of character?
<TabAtkins> fantasai: No matter *where* it is, there's a list of affecte characters
<TabAtkins> chrishtr: Where?
<astearns> https://drafts.csswg.org/css-text-4/#fullwidth-collapsing
<TabAtkins> florian: It's specific CJK puncturation, there's a list
<TabAtkins> chrishtr: okay, that list makes sense
<TabAtkins> chrishtr: Next question si implementability
<TabAtkins> chrishtr: Do other systems ahve this?
<TabAtkins> florian: Yes, it's not harder than doing it contextually which is already implemented
<TabAtkins> florian: This is only a question of "when do you do it", now "how", so it's not harder than existing implemented features.
<TabAtkins> astearns: Yeah that's the existing trim-auto value I asked about
<TabAtkins> florian: I think browser engines currently dont' but other CSS engines (like Antenna House) do, and they work on a similar font stack even if their impl is different.
<TabAtkins> florian: Non-web like InDesign too, their layout engine works differently but their font system doesn't
<TabAtkins> chrishtr: jfkthame, opinions from impl pov?
<TabAtkins> jfkthame: It's not something I've looked into personally so I'm not very confident atm
<TabAtkins> jfkthame: I guess I have some questions about how easily this can be implemented across all potential fonts, whether they have the related OpenType features or not. If they don't you might need some tricky heuristics to decide what to do
<TabAtkins> florian: In theory yes, in practice no. These fonts are very square and where the whitespace occurs is very predictable.
<TabAtkins> florian: Fancy unusual fonts might do something unusual, but the vast majority of standard fonts will have precisely the left or right half blank.
<TabAtkins> jfkthame: I've seen some fonts where the fullwidth glyph appears in the middle of the square rather than at one edge, which makes trim harder
<TabAtkins> fantasai: That doesn't generally happy except for...
<TabAtkins> fantasai: there's a set of punctuation where they're eitehr on one side or in the middle, like period or semicolon. Those are little more difficult to manage, tends to be language-dependent. Chinese tends to go in the middle, Japanese to a side.
<TabAtkins> fantasai: There's a note in the spec about that.
<TabAtkins> fantasai: This isn't ideal, it woudl be great to use font features to do this trimming. But this is a pretty important part of typesetting in Japanese text, and it not being done on the web looks wrong and ugly. Pretty much any non-web system doing Japanese text is going to do this propertly, so we need to figure out how to do it
<astearns> +1 to having this done properly on the web would be good
<TabAtkins> florian: Back to impl, the question is interesting but orthogonal.
<TabAtkins> florian: Whether or not we add trim-all we already have the trim system in the spec, so adding this doesn't change that
<TabAtkins> florian: Might want to open an issue in general about whether there are hard cases, etc.
<TabAtkins> chrishtr: Given no browser implements trim-auto yet I would like confirmation that it's implementable. I've message Koji who's working on similar things in Blink.
<TabAtkins> fantasai: Back to florian's point, if the objection is to trimming at all you're objecting to the whole feature, not this value.
<TabAtkins> chrishtr: I'd like to not add new things to this before confirming that.
<TabAtkins> astearns: Yes and no. We can make progress on this feature even if it's only for non-web processors.
<TabAtkins> astearns: I'd like to have a r4esolution to add this value to *complete* this feature, then ahve a separate issue about whether it's implementable.
<TabAtkins> astearns: Is that okay?
<TabAtkins> chrishtr: Sure.
<TabAtkins> jfkthame: Fine with me.
<TabAtkins> astearns: So proposed resolution is to add trim-all value which will do puncturation trimming in all cases. Objections?
<TabAtkins> RESOLVED: Add trim-all value
<TabAtkins> astearns: Also fantasai asked for an action on the editors to ask the i18n WG if we need a distinction between brackets and other punctuation.
<TabAtkins> astearns: Not entirely sure, but suspect this might open a big can of worms. I recall Japanese publishing houses having definite and disparate rules on punctuation layout. To satisfy everyone we might need an explosion of things. But happy to ask and see what we get.
<TabAtkins> fantasai: I think we're less likely because the bracket/non-bracket distinction is pretty clear.
<TabAtkins> fantasai: Looking at murakami's examples, some didn't trim commas but did trim all brackets.
<chrishtr> q+
<TabAtkins> fantasai: Another use-case is decimal points and commas in a number, if there's a <number> HTML element or just a <span class=number>, trim all commas and periods, but if you happen to use parens for negatives or something you might not want them trimmed.
<florian> q+
<TabAtkins> fantasai: So being able to make a distinction between these sets might help with proper semantic markup
<TabAtkins> chrishtr: I heard back from Koji, it does seem implementable but maybe not for all fonts.
<TabAtkins> chrishtr: Might need to be an allowlist of fonts where it's allowed to work. Does the spec allow for that?
<TabAtkins> fantasai: I think that's a separate issue.
<TabAtkins> astearns: The section that talks about which characters are affected does mention various opentype features you may use, or must not use
<TabAtkins> astearns: If it's not already there we could probably add to that section
<TabAtkins> chrishtr: Yeah might be as simple as using SHOULD rather than MUST.
<astearns> ack chrishtr
<astearns> ack florian
<TabAtkins> florian: For impl we should have another issue and look into the details.
<TabAtkins> florian: If you can't reliably detect that the browser's gonna do it or not, this is problematic.
<TabAtkins> florian: Right now they'll put spans in and use negative margins to simulate. If browsers do it *sometimes* they won't know whether ot correct or not.
<TabAtkins> florian: So if we can't make it work everywhere, at least make it predictable.
<TabAtkins> florian: Also I think it doesn't hurt to ask the i18n WG but I do suspect different people will want different things.
<TabAtkins> florian: And people who are very specific will be okay with preprocessing their text with markup to get exactly what they want
<TabAtkins> ACTION fantasai, florian to ask i18n about distinguishing between brackets and non-brackets
<TabAtkins> astearns: We can have a new issue on implementability and intearction with font features.
<TabAtkins> astearns: How it's implemented and whethe rit's feature detectable.
frivoal commented 1 year ago

Follow up with i18n filed at https://github.com/w3c/jlreq/issues/377

frivoal commented 1 year ago

Basic tests introduced with https://github.com/web-platform-tests/wpt/pull/42274