Make async clipboard APIs (read/write) to sanitize interoperably with setData/getData for text/html

gked commented 3 years ago

Hi All, Excel online team reached out to us on several issues they found with clipboard formats. Sanitization across legacy and async clipboard APIs is not consistent for some mime types. For example, while text/html when set with setData/getData, keeps meta tags in Chrome, async clipboard APIs strip them down. This creates an issue for online text editing applications. Target apps, processing paste, rely on meta tags to infer information about the clipboard payload source. @snianu has wrote a detailed analysis which can be found here. A matrix view of formats can be found here. We propose that we make async clipboard read/write serialization behaviors consistent with legacy clipboard API such as setData/getData. The reason being browsers already expose this information through setData/getData and making async clipboard read/write to behave in a compatible way which will ease its adoption over time.

If we can all agree on this, we will follow up with a PR clarifying the behavior in the Clipboard Spec which will also take care of #140

CC: @rniwa @whsieh @dway123 @garykac @BoCupp-Microsoft @megangardner @johanneswilm

annevk commented 2 years ago

My proposal doesn't attempt to resolve this one way or another. It merely adds support for a parallel set of formats that are never sanitized.

I think for built-in formats we want:

The same behavior for the new and legacy API.
Optional, implementation-defined modification of the contents to uphold user agent privacy requirements.

snianu commented 2 years ago

My proposal doesn't attempt to resolve this one way or another. It merely adds support for a parallel set of formats that are never sanitized.

I think it does affect the well-known formats at least on Safari. Consider the below case: When text/html is copied from website X and pasted within the same origin that X belongs to, then Safari returns the content from the pickled html format (no sanitization involved). With your proposal, now Safari will return sanitized fragment when author queries text/html from X as the pickled HTML format lives in "web text/html". Basically, your proposal doesn't work on Safari which I'm not sure is an issue because Apple doesn't want to support custom format in cross origin anyway, so they can always return the pickled format from "web " bucket if queried from the same origin for text/html.

Now, on Chrome and Firefox, regardless of what origin the author queries text/html from, DataTransfer APIs always return unsanitized HTML format, but this is different from pickled format as the standard HTML format contains platform specific headers as described here. These headers are stripped out, but the HTML markup is a full HTML document, not a sanitized fragment. With your proposal, we have to return unsanitized HTML document with platform specific headers stripped out when authors queries text/html format. Is that correct?

annevk commented 2 years ago

If someone copies text/html onto the clipboard, why would a web text/html entry be created? That doesn't make sense. (I also don't think Apple said they don't want to support web ... formats across origins.)

whsieh commented 2 years ago

I think it does affect the well-known formats at least on Safari. Consider the below case: When text/html is copied from website X and pasted within the same origin that X belongs to, then Safari returns the content from the pickled html format (no sanitization involved). With your proposal, now Safari will return sanitized fragment when author queries text/html from X as the pickled HTML format lives in "web text/html". Basically, your proposal doesn't work on Safari which I'm not sure is an issue because Apple doesn't want to support custom format in cross origin anyway

The way I understood Anne's proposal, if site X wrote "text/html" and then site X reads "text/html", in WebKit, there would be a single item "text/html" that would contain the original unsanitized data upon reading.

If site Y were to read the same data copied from site X, it would contain the sanitized "text/html" data.

BoCupp-Microsoft commented 2 years ago

@snianu and I discussed offline. I don't see any of the points discussed since this post from @annevk as being a blocker for us.

Both of the points mentioned:

The same behavior for the new and legacy API.

Optional, implementation-defined modification of the contents to uphold user agent privacy requirements.

...allow for Apple's preferred same-origin behavior, as well as our preferred behavior for Chromium, which is to allow unsanitized access to well-known formats like text/html, which would be consistent with our current behavior for the legacy Clipboard Event API.

snianu commented 2 years ago

Closing this issue as it's resolved in #174

w3c / clipboard-apis

Make async clipboard APIs (read/write) to sanitize interoperably with setData/getData for text/html #150