[css-text-4] text-autospace: what gets copied?

fantasai commented 1 year ago

@r12a wrote in https://github.com/w3c/csswg-drafts/issues/4246#issuecomment-1415841181 :

Suppose a piece of Japanese text and a piece of French text contains some sentences where ascii spaces are used to create spacing and other sentences where no such space characters are used. Then a content author applied auto-space, let's say for arguments sake with replace switched on. Then a reader copies the text containing gaps. What ends up on the clipboard?

I'm assuming that the clipboard would contain space characters where they were originally present in the text, but not retain any gaps where the separation was achieved only through applying CSS autospace.

(In other words, if you copy several paragraphs in French where the original author used autospace to introduce a gap before/after punctuation, then after pasting the text would need to be autospaced or manually edited to reintroduce the gaps.)

We have two types of autospacing:

inserts gaps that don't correspond to any particular character (inter-script values)
inserts special spaces (punctuation value)

Also, in both cases, the replace keyword can remove U+0020 from the text.

In general, CSS doesn't alter the text that gets copied (e.g. text-transform is not applied), but we do make at least one exception:

white space collapsing Phase 1 is applied in order to remove source code formatting characters https://www.w3.org/TR/css-text-3/#plaintext

There's also a bit of an open question on whether the content property should affect copy/paste. It thought we'd discussed this, but can't find it...

It's pretty clear to me that the first type of autospacing doesn't insert anything into the copied text. However, for the removal of U+0020 or insertion of nbsp and narrow no-break space, etc, are these copied into the paste buffer?

fantasai commented 1 year ago

My personal take is that from both the author's and user's perspective, ideally yes, these should be copied: we're fixing up the source, which doesn't have the correct spacing characters or has excess spacing characters, to match the intended text stream, so it would be useful if the copied text represented that intended text stream rather than its sloppy source.

(This is different from text-transform, which is a stylistic decision that can vary depending on the desired presentation of a single source text. The source text in that case is the truest representation of the content, and we're effectively borrowing glyphs from a different case to effect a particular rendering style.)

fantasai commented 1 year ago

i18n discussion with xfq and atsushi and Florian seems also landing on copying the transformed output...

CC @MurakamiShinyu @macnmm for their take

xfq commented 1 year ago

We discussed this in yesterday's CLReq Editors' Call and the consensus was not to add the space to the clipboard for Chinese. (If there are spaces in the original text, then the spaces should be preserved.)

Some people think that spaces should also not be added for French, but we don't have a strong opinion on this.

r12a commented 9 months ago

we're fixing up the source, which doesn't have the correct spacing characters or has excess spacing characters, to match the intended text stream, so it would be useful if the copied text represented that intended text stream rather than its sloppy source.

I'm not sure that that is true. We have seen numerous examples where space was not provided, and sometimes that seemed to actually be the intended style. The CSS allows an author to apply a different style, or to regularize the style for the context in which the text will be displayed, but i think that text copied to the clipboard should be the same as the actual content.

I suspect that copy-paste is typically only likely to involve a small number of places where the original author might have omitted/added space, and so should be easy to fix up. I don't think we should expect that this could become a useful batch processing tool for getting regularly spaced text by creating a web page, adding some CSS, then copying the text back out to the place we want to have it.

So i think we should do just as we do for text-transform, and view the autospace transformation as a presentational tool rather than something that changes the underlying text that is added to the clipboard.

kidayasuo commented 8 months ago

We talked about this during the JLReq TF meeting on 2024-2-27. We didn't reach a clear agreement on what should be copied; there were opinions from both sides.

One concern brought up was that adding or removing spaces could change the meaning of the text. For instance, if a word contains both Kana/Kanji and Latin/Numbers, like 'Tシャツ,' inserting a space between 'T' and 'シャツ' would make the morphological analyzer see them as two separate words, changing the text's meaning. This could also affect how Text-to-Speech reads the text. Conversely, there might be cases where removing an intentional space could alter the text's semantics.

r12a commented 8 months ago

@kidayasuo that sounds to me like another reason for copying the text, rather than copying the presentation. Do you agree?

kidayasuo commented 8 months ago

@r12a Yes, I personally agree. Unlike the white space collapsing case it would generate spaces that did not exist, or remove spaces that might have been intentional. It does not seem like a safe alteration.

With that said, however, I have to repeat that there were opinions from both sides at the JLReq TF meeting. From users's point of view, copying representation is WYSIWYG, i.e. it might be closer to what they expect.

xfq commented 7 months ago

As a data point, Microsoft Word also has this function. If there are manually added spaces, they will be copied. If there are no manually added spaces, the extra spacing will not become U+0020 on the clipboard. Users can add the spaces themselves if they want to.

macnmm commented 7 months ago

In my opinion, the spacing added between characters for Japanese line layout rules are not equivalent to the space character, which is a word separator for Latin but not normally used to separate all Latin from Japanese in the same line. The reason for this is that during line compression or expansion, the space character width is optimized for Latin (it is the wrong width by default) and expands or contracts differently from the Latin-J spacing from publisher house rules or other J layout conventions. This is why Latin-J spacing rules exist, and need to be treated separately from the Latin U+0020 space.

Yes, people have added space for Latin-J spacing in the past, in engines that so not support correct spacing rules for Japanese layout, but trying to convert between one and the other would degrade quality and confuse users, I feel.

frivoal commented 7 months ago

I guess there's tension between the CJK use case, for which we should probably not copy the spaces for the reasons discussed above, and the french use case, where we are effectively fixing incorrectly typed text by inserting (narrow) no-break space. Those are the thing people should have typed but don't, because common input systems for French don't make it easy to type them.

I think the CJK case should win here. The behavior for french is a convenience, but could be done correctly by different means, while dynamic space insertion for CJK at layout time is the right way to handle that.

fantasai commented 6 months ago

@frivoal The spacing in both cases is different though: the CJK use cases we are potentially removing spaces; if we add space, it's not a character (it's like letter-spacing), and therefore won't get copied. If we add spaces for French, then we are inserting characters.

So the question here is twofold:

Do we remove spaces removed by text-autospace for CJK?
Do we add/convert spaces added/converted by text-autospace for French?

css-meeting-bot commented 6 months ago

The CSS Working Group just discussed [css-text-4] text-autospace: what gets copied?.

The full IRC log of that discussion

<fantasai> florian_irc: Current recommendation from i18n is to just copy the source
<fantasai> florian_irc: Elika does point out there might be more nuance, because in CJK it adds spacing but doesn't add space characters (but can remove them)
<fantasai> florian_irc: whereas in other cases, it can insert actual characters
<fantasai> florian_irc: unsure if this was considered well enough
<fantasai> florian_irc: maybe we should chat with i18n for that last point first, and come back

macnmm commented 5 months ago

I think of adding space characters for J-Latin spacing to be similar to adding 2 spaces after a period. This kind of text is good for plain text and primitive layout, but not usable for DTP without pre-processing to remove the extra spaces.

kidayasuo commented 4 months ago

We discussed at today’s JLReq TF meeting again and reached a consensus.

As a unified position, the JLReq TF recommends that the source text should be copied for the Japanese text. We do not have any opinions or comments regarding the French issue.

r12a commented 4 months ago

For me, that's also the correct approach for the French case.

css-meeting-bot commented 1 day ago

The CSS Working Group just discussed [css-text-4] text-autospace: what gets copied?.

RESOLVED: plain text copy must ignore text replaced autospace. open to other feedback

The full IRC log of that discussion

<noamr> florian: text-autospace, in cjk at the limit between cjk and other, e.g. numeric, it adds spacing because it's typographically expected
<noamr> florian: the property has a syntax variant that it's not inserting but rather replacing the spacing, because sometimes that there is an explicit space
<noamr> florian: similar, outside of cjk, this is used to insert spaces in french before some punctuations. In some cases there is a narrow space, and CSS does it for people because people don't do it
<noamr> florian: what do we do with copy-paste in this mode? At least in the CJK context we ignore and copy the source
<noamr> florian: there are individual opinions that we should do the same for french
<noamr> florian: we were wondering whether this should be different for french, it's more an abuse/correction than styling
<noamr> florian: the i18n recommendation was to just copy the source
<noamr> florian: I was wondering about french, perhaps we should copy the original one
<noamr> q+
<noamr> florian: I would expect it to be copied correctly as a user, definitely for cjk we shouldn't copy
<noamr> s/florian/fantasai
<noamr> fantasai: there wasn't a conclusion in the group about what to do when we use replacement
<astearns> ack noamr
<fantasai> s/conclusion/obvious agreement/
<noamr> fantasai: for CJK we shouldn't copy, but for french, narrow spaces etc, a lot of other contexts wouldn't fix it, e.g. word wouldn't correct pasted stuff
<fantasai> s/CJK/CJK autospace insertion/
<noamr> florian: for having the spaces manually inputted into the text, it might be intentional, but if you're copy/pasting into an email, it's nicer if there was a space there
<noamr> florian: I find the i18n resolution plausible, still on the fence for french. then again it's a bit of a hack
<noamr> astearns: do we know what browsers do?
<noamr> fantasai: I suspect they use the underlying text
<noamr> fantasai: let's close and using underlying text, but come back to it if users complain
<noamr> fantasai: users can copy stuff that has a space, sometimes they'll get one and sometimes not
<noamr> fantasai: it's probably better to do the fixup in terms of users, but perhaps we can wait
<noamr> astearns: inclined to resolve on specifying we should copy the underlying text, perhaps with a note that this can change with user feedback
<noamr> fantasai: we can be open to changing it if there's user demand for it
<noamr> Jonathan: it's similar to copying text decorations. there are always going to be people unhappy with the decision
<noamr> ... I think this is up to presentation tools
<noamr> ... we shouldn't mangle the actual data
<TabAtkins> noamr: isn't htis a bit up to the browser? if the browser wants multiple copy functions in the context menu
<TabAtkins> noamr: is is the standard's job?
<TabAtkins> fantasai: there's often different format, sure - plain text, or html with formatting, etc
<jfkthame> s/copying text decorations/copying text with text-transform/
<TabAtkins> fantasai: we're just talking about plain text copy
<TabAtkins> fantasai: we do say that we collapse white space for a plain text copy, for example. if you didn't do that, the text would be a big mess if there's indentation
<TabAtkins> noamr: we copy capitalization, don't we?
<TabAtkins> florian: this is contentious, but no
<TabAtkins> fantasai: text-transform is a bit different because it's not correcting the text, it's providing a style. we're clear in the spec that it's a contextual style
<TabAtkins> fantasai: that's why you don't copy it
<TabAtkins> fantasai: the cases we're considering doing the transform are where the literal text isn't actually represetntative of the char string the author wants
<TabAtkins> fantasai: not just for presentation purposes
<TabAtkins> fantasai: if you're typing regular space instead of nbthinspace, you're doing it because nb thin space is hard to type
<TabAtkins> noamr: okay so it's more of an error correction
<noamr> florian: the injection to CJK is styling. the removal and replacement in french is error correction
<TabAtkins> florian: yeah, at laest part of it. the injectin of spacing in cjk contexts is styling. the removal/change of what's there (in french) is correction
<noamr> dbaron: a copy operation in many OSes already does multiple things at once
<noamr> florian: we're only defining plain text here
<noamr> PROPOSED REOSLUTION: plain text copy must ignore text replaced autospace. open to other feedback

w3c / csswg-drafts

[css-text-4] text-autospace: what gets copied? #8511