w3c / clipboard-apis

Clipboard API and events
https://w3c.github.io/clipboard-apis/
Other
143 stars 41 forks source link

Replacing no-break spaces when converting HTML to plain text upon clipboard export #173

Open hsivonen opened 2 years ago

hsivonen commented 2 years ago

Gecko bug for context.

It's unclear to me if the operation of generating a plain-text representation of HTML copied to the clipboard is within the scope of this (or any) spec, but in case it is:

It appears that:

The shortest path to have all three do the same thing would be for Gecko to change only not to replace no-break spaces when exporting plain text as plain text.

However, it's bad to replace no-break spaces with regular spaces in HTML to plain text conversion in cases where the no-break spaces are used for a legitimate purpose (e.g. in combination with French quotation marks). At least in Gecko, the replacement of no-break spaces with regular spaces is motivated by undoing the contentEditable behavior of generating making every other space bar press insert a no-break space to counteract CSS's space collapsing behavior and producing visible spaces on every space bar press in contentEditable.

Questions:

mbrodesser commented 2 years ago

At least sanitizing text in clipboard.readText() was identified as an issue, see "Issue 5" in step 1.3 of https://w3c.github.io/clipboard-apis/#dom-clipboard-readtext, so that's in scope of this spec.

css-meeting-bot commented 2 years ago

The Web Editing Working Group just discussed Henri's issue.

The full IRC log of that discussion <Travis> Topic: Henri's issue
<Travis> github: https://github.com/w3c/clipboard-apis/issues/173
<Travis> henri: w/contenteditable, user expects that space bar produces a visible space.
<Travis> .. originally, there was no CSS for whitespace: pre ?
<Travis> .. (because of whitespace collapsing)
<Travis> .. clipboard adds alternating spacing + non-breaking spaces
<Travis> .. when browser maps all nbsp to regular space.
<Travis> .. (number)nbsp(unit) these can sometimes be replaces.
<Travis> .. conclusion: impossible to copy from the web that retains nbsp's.
<Travis> .. contemplating change in Gecko to...
<Travis> .. when nbsp isn't adjacent to a regular space (both ends touch something other than a space--since those aren't created by editor as a hack), then LEAVE THEM BE.
<Travis> .. This is an area that is not really part of web interop concerns...
<Travis> .. compat/interop concern is from copy-then-paste all within the web platform.
<Travis> .. Q to other vendors: any concerns with this plan?
<Travis> .. can you see interop problems with this?
<Travis> .. (except a divergence between the three engines doing the same thing in this case)
<Travis> BoCupp: Not sure I follow what all the browsers are doing...
<Travis> .. Do all browsers just put the copy of the spaces when copying...
<Travis> henri: when copying from plaintext (no HTML involved); blink preserves nbsp,
<Travis> .. Gecko does not.
<Travis> .. when pasting into plaintext, all engines currently replace the nbsp. Gecko wants to diverge from this.
<Travis> BoCupp: which scenario are we optimizing for?
<Travis> henri: when HTML contains a nbsp for legitimate typographical reasons (keep units together with number, french quotes, etc.)
<Travis> .. anything except faking the collapsing of space by the editor.
<Travis> .. hypothesis: all other cases are legitimate uses of nbsp and should be preserved when exporting to clipboard.
<Travis> .. so we don't want to mess with those.
<Travis> johanneswilm: do you think you can detect all the case when the editor does the fixup?
<Travis> henri: if there is a sequence of nbsp has either an ascii space before/after, then we would consider that editor-generated.
<Travis> .. everything else would be considered a legitimate.
<Travis> .. I haven't done the research to see if editors expect that behavior... my experience is that existing web logic expects the current editing behavior.
<Travis> whsieh: q: idea is to preserve nbsp in dataTransfer.data or paste to plaintext and readback?
<Travis> henri: idea is to preserve nbsp when exporting to native clipboard flavor; the rest of the behavior would flow from that.
<Travis> .. if an app paste to plaintext, then it would be affected.
<Travis> .. a little handwavy to understand the other subtle places where this might impact.
<Travis> whsieh: I wonder if there would be compat with apps (external to browser).
<Travis> henri: the case with a textarea in webkit, with no spaces, then the copy inserts the nbsp places... if that breaks apps then it would be an existing concern.
<Travis> whsieh: it would be broadening the concern if it was there.
<Travis> Travis: sounds like the consensus of the group is to "give it a try" and report back?
<Travis> .. didn't hear any objections (just questions)?
<Travis> johanneswilm: is this something we want to put in the spec? Or are we just OK with the interop divergence.
<Travis> henri: I'm not asking for inclusion in a spec (this is borderline not part of standards).
<Travis> BoCupp: I like the suggestion (it makes sense to me). When you do it and have success, I think it would be great if we could write it down.
<Travis> .. contenteditable spec has a section we could put this into...
<whsieh> q+
<Travis> johanneswilm: execcommand?
<Travis> BoCupp: ..looking for a link.
<Travis> johanneswilm: For now, henri should try it out and report back on the issue.
<Travis> .. like seeing the algorithm for determining the space handling that henri is going to try.
<Travis> .. other browsers may want to then try it out.
<Travis> BoCupp: can you comment on which issue (mentioned in the github issue) ...
<Travis> henri: I think the scenario is copying from the web, then pasting into plaintext textarea--demonstrating how it's relevant to web interop.
<whsieh> q-
<Travis> BoCupp: These would be changes to the serialization to the clipboard?
<Travis> henri: not sure. I was thinking of the action when a range in the DOM is exported to clipboard (HTML) on copy.
<Travis> .. to the extent there are ways to trigger the export (other than users pressing Ctrl+C), would assume they would got through the same code path. If not, then that's an additional complication.
<Travis> BoCupp: Suspect that they don't go through the same codepath.
<Travis> .. some cases you walk the DOM, in other cases, you're just given some text to insert.
<Travis> henri: Okay.
<Travis> BoCupp: Like the idea of you experimenting with it!
<Travis> johanneswilm: and if it DOESN'T work, we'd appreciate knowing!
mbrodesser commented 2 years ago

For the record: Chrome (at least on Ubuntu 20.04) forbids copying when a contenteditable element is selected: data:text/html,A<div contenteditable>X</div>.

CC @masayuki-nakano

hsivonen commented 1 year ago

The shortest path to have all three do the same thing would be for Gecko to change only not to replace no-break spaces when exporting plain text as plain text.

This was OK.

Would it be bad in practice (as opposed to principle of deviating from the current interop state) to replace no-break space with regular spaces in HTML to plain text conversion only when the contentEditable-like pattern of alternating spaces and no-break spaces is detected?

This turned out not to be Web-compatible due there being sites that, instead of using the pre element or a relevant CSS property, replaced spaces in code examples with no-break spaces and relied on browser reversing the replacement upon copy to clipboard.