Open fantasai opened 1 year ago
My personal take is that from both the author's and user's perspective, ideally yes, these should be copied: we're fixing up the source, which doesn't have the correct spacing characters or has excess spacing characters, to match the intended text stream, so it would be useful if the copied text represented that intended text stream rather than its sloppy source.
(This is different from text-transform
, which is a stylistic decision that can vary depending on the desired presentation of a single source text. The source text in that case is the truest representation of the content, and we're effectively borrowing glyphs from a different case to effect a particular rendering style.)
i18n discussion with xfq and atsushi and Florian seems also landing on copying the transformed output...
CC @MurakamiShinyu @macnmm for their take
We discussed this in yesterday's CLReq Editors' Call and the consensus was not to add the space to the clipboard for Chinese. (If there are spaces in the original text, then the spaces should be preserved.)
Some people think that spaces should also not be added for French, but we don't have a strong opinion on this.
we're fixing up the source, which doesn't have the correct spacing characters or has excess spacing characters, to match the intended text stream, so it would be useful if the copied text represented that intended text stream rather than its sloppy source.
I'm not sure that that is true. We have seen numerous examples where space was not provided, and sometimes that seemed to actually be the intended style. The CSS allows an author to apply a different style, or to regularize the style for the context in which the text will be displayed, but i think that text copied to the clipboard should be the same as the actual content.
I suspect that copy-paste is typically only likely to involve a small number of places where the original author might have omitted/added space, and so should be easy to fix up. I don't think we should expect that this could become a useful batch processing tool for getting regularly spaced text by creating a web page, adding some CSS, then copying the text back out to the place we want to have it.
So i think we should do just as we do for text-transform, and view the autospace transformation as a presentational tool rather than something that changes the underlying text that is added to the clipboard.
We talked about this during the JLReq TF meeting on 2024-2-27. We didn't reach a clear agreement on what should be copied; there were opinions from both sides.
One concern brought up was that adding or removing spaces could change the meaning of the text. For instance, if a word contains both Kana/Kanji and Latin/Numbers, like 'Tシャツ,' inserting a space between 'T' and 'シャツ' would make the morphological analyzer see them as two separate words, changing the text's meaning. This could also affect how Text-to-Speech reads the text. Conversely, there might be cases where removing an intentional space could alter the text's semantics.
@kidayasuo that sounds to me like another reason for copying the text, rather than copying the presentation. Do you agree?
@r12a Yes, I personally agree. Unlike the white space collapsing case it would generate spaces that did not exist, or remove spaces that might have been intentional. It does not seem like a safe alteration.
With that said, however, I have to repeat that there were opinions from both sides at the JLReq TF meeting. From users's point of view, copying representation is WYSIWYG, i.e. it might be closer to what they expect.
As a data point, Microsoft Word also has this function. If there are manually added spaces, they will be copied. If there are no manually added spaces, the extra spacing will not become U+0020 on the clipboard. Users can add the spaces themselves if they want to.
In my opinion, the spacing added between characters for Japanese line layout rules are not equivalent to the space character, which is a word separator for Latin but not normally used to separate all Latin from Japanese in the same line. The reason for this is that during line compression or expansion, the space character width is optimized for Latin (it is the wrong width by default) and expands or contracts differently from the Latin-J spacing from publisher house rules or other J layout conventions. This is why Latin-J spacing rules exist, and need to be treated separately from the Latin U+0020 space.
Yes, people have added space for Latin-J spacing in the past, in engines that so not support correct spacing rules for Japanese layout, but trying to convert between one and the other would degrade quality and confuse users, I feel.
I guess there's tension between the CJK use case, for which we should probably not copy the spaces for the reasons discussed above, and the french use case, where we are effectively fixing incorrectly typed text by inserting (narrow) no-break space. Those are the thing people should have typed but don't, because common input systems for French don't make it easy to type them.
I think the CJK case should win here. The behavior for french is a convenience, but could be done correctly by different means, while dynamic space insertion for CJK at layout time is the right way to handle that.
@frivoal The spacing in both cases is different though: the CJK use cases we are potentially removing spaces; if we add space, it's not a character (it's like letter-spacing), and therefore won't get copied. If we add spaces for French, then we are inserting characters.
So the question here is twofold:
text-autospace
for CJK?text-autospace
for French?The CSS Working Group just discussed [css-text-4] text-autospace: what gets copied?
.
I think of adding space characters for J-Latin spacing to be similar to adding 2 spaces after a period. This kind of text is good for plain text and primitive layout, but not usable for DTP without pre-processing to remove the extra spaces.
We discussed at today’s JLReq TF meeting again and reached a consensus.
As a unified position, the JLReq TF recommends that the source text should be copied for the Japanese text. We do not have any opinions or comments regarding the French issue.
For me, that's also the correct approach for the French case.
The CSS Working Group just discussed [css-text-4] text-autospace: what gets copied?
.
RESOLVED: plain text copy must ignore text replaced autospace. open to other feedback
@r12a wrote in https://github.com/w3c/csswg-drafts/issues/4246#issuecomment-1415841181 :
We have two types of autospacing:
punctuation
value)Also, in both cases, the
replace
keyword can remove U+0020 from the text.In general, CSS doesn't alter the text that gets copied (e.g.
text-transform
is not applied), but we do make at least one exception:There's also a bit of an open question on whether the
content
property should affect copy/paste. It thought we'd discussed this, but can't find it...It's pretty clear to me that the first type of autospacing doesn't insert anything into the copied text. However, for the removal of U+0020 or insertion of nbsp and narrow no-break space, etc, are these copied into the paste buffer?