w3c / editing

Specs and explainers maintained by the editing task force
http://w3c.github.io/editing/
Other
192 stars 40 forks source link

Seeking feedback on Clipboard Pickling APIs. #334

Open snianu opened 3 years ago

snianu commented 3 years ago

Here is the explainer: https://github.com/w3c/editing/blob/gh-pages/docs/clipboard-pickling/explainer.md#pickling-for-async-clipboard-api

Would love to hear feedback on the design of the API & the OS format naming proposal. Tagging few folks to get some attention as we have just completed the implementation in Chromium and looking to request for origin trail to experiment with our partners. @rniwa @whsieh @annevk @BoCupp-Microsoft @garykac @mkruisselbrink

rniwa commented 3 years ago

I'm having a trouble understanding what's being proposed here. Conceptually there are two things:

  1. Website accessing data in the system pasteboard/clipboard provided by native applications
  2. Native applications accessing data website has written that's normally exposed natively (e.g. png, psd, docx files)

Per prior discussion, for (1), native applications must opt-in to expose information to web apps directly. So, solving this problem requires some kind of agreement between native applications and web browsers to do this.

Looks like this proposal is saying that we'd let web app specify unsanitized option to retrieve this? Is this to protect websites from receiving potentially harmful (e.g. HTML with scripts) content? But if there is some malicious native app on user's machine, that application can do whatever it wants to do with any website, right? I'm really failing to see what this option is for.

For (2), native applications need some mechanism to access the data which isn't usually available to them.

The proposal also suggests adding ability to write unsanitized version of things. Again, I'm unsure why this is needed. Why can't the browser always make unsanitized version available to native apps which request it? Why does a website explicitly need to request this?

So again, I'm failing to see any need to add Web API for both use cases. Like we've repeatedly stated in the past, Apple's WebKit team believes this is a domain of the operating system design. If these use cases are important, then we'd likely introduce new API at OS level (I'm neither confirming nor denying we may or may not do this).

snianu commented 3 years ago

Thank you @rniwa for the feedback! I've addressed your comments below. Please let me know if you have further questions!

Per prior discussion, for (1), native applications must opt-in to expose information to web apps directly. So, solving this problem requires some kind of agreement between native applications and web browsers to do this.

That is correct. We want native apps and sites to explicitly opt-in to reading the custom formats from the clipboard so we don't expose potentially harmful content without the sites/apps being aware of the security implications. unsanitized option lists the formats that we want to expose to the clipboard without performing any kind of sanitization. This could contain custom formats defined by sites like Excel Online(e..g a shadow workbook) as well as standardized formats (such as HTML, PNG etc) predefined by the OS. Sites would use this option to read/write unsanitized formats that the native apps can use (or vice versa) to provide high fidelity copy-paste. e.g. Excel Online would use this API to read/write shadow workbook that could contain tables with formulas and other rich formats. That way browsers don't have to add explicit support for these formats and sites/apps can leverage custom formats to enable this feature.

The proposal also suggests adding ability to write unsanitized version of things. Again, I'm unsure why this is needed. Why can't the browser always make unsanitized version available to native apps which request it? Why does a website explicitly need to request this?

The unsanitized option is particularly useful when reading/writing standardized formats as that is the default set of formats that all browsers support. This option helps us to only read the unsanitized version of the standard formats that were requested by the web author so we don't have to read both sanitized and unsanitized version of all the standard formats that are present on the clipboard. It is described here in the explainer. If the web authors provide custom format in the unsanitized list, then we will only request the unsanitized version of the custom format. The standard formats mentioned in the read/write async APIs will be processed as usual i.e. read/write with proper sanitization steps.

rniwa commented 3 years ago

Thank you @rniwa for the feedback! I've addressed your comments below. Please let me know if you have further questions!

Per prior discussion, for (1), native applications must opt-in to expose information to web apps directly. So, solving this problem requires some kind of agreement between native applications and web browsers to do this.

That is correct. We want native apps and sites to explicitly opt-in to reading the custom formats from the clipboard so we don't expose potentially harmful content without the sites/apps being aware of the security implications. unsanitized option lists the formats that we want to expose to the clipboard without performing any kind of sanitization. This could contain custom formats defined by sites like Excel Online(e..g a shadow workbook) as well as standardized formats (such as HTML, PNG etc) predefined by the OS. Sites would use this option to read/write unsanitized formats that the native apps can use (or vice versa) to provide high fidelity copy-paste. e.g. Excel Online would use this API to read/write shadow workbook that could contain tables with formulas and other rich formats. That way browsers don't have to add explicit support for these formats and sites/apps can leverage custom formats to enable this feature.

Again, I'm getting really confused by mixing up reading & writing of system pasteboard content. What problem exactly are we solving by explicitly requesting unsanitized format for reading or writing pasteboard content in a website / webapp. Please define the threat model of each scenario separately, and explain why an explicit request for unsanitized content is required.

The proposal also suggests adding ability to write unsanitized version of things. Again, I'm unsure why this is needed. Why can't the browser always make unsanitized version available to native apps which request it? Why does a website explicitly need to request this?

The unsanitized option is particularly useful when reading/writing standardized formats as that is the default set of formats that all browsers support. This option helps us to only read the unsanitized version of the standard formats that were requested by the web author so we don't have to read both sanitized and unsanitized version of all the standard formats that are present on the clipboard.

Why would the browser need to read both versions without this option?

It is described here in the explainer. If the web authors provide custom format in the unsanitized list, then we will only request the unsanitized version of the custom format. The standard formats mentioned in the read/write async APIs will be processed as usual i.e. read/write with proper sanitization steps.

I really don't follow the description there. When the user copies something on a website, the website shouldn't be in control of whether a given MIME type should be exposed to another app or not. Namely, if a website rewrite "unsanitized" HTML markup, then the browser SHOULD provide both sanitized version & unsanitized version to other native applications at least on Apple platforms.

snianu commented 3 years ago

Just wanted to quickly reply to some of your concerns now. Will post another reply to the below question describing the threat model in more details:

Again, I'm getting really confused by mixing up reading & writing of system pasteboard content. What problem exactly are we solving by explicitly requesting unsanitized format for reading or writing pasteboard content in a website / webapp. Please define the threat model of each scenario separately, and explain why an explicit request for unsanitized content is required.

If we make the process of reading/writing unsanitized content more explicit, then there is an implicit expectation that the developers are aware of the security implications and would have mitigations in place(e.g use intense fuzzing, such as provided by OSSFuzz). Quoting @dway123 here as I think this response captures lot of details about why we are proposing the unsanitized option:

Requiring sites to provide the unsanitized list allows sites to support both an unsanitized (for reading by the site) and sanitized (for reading by other sites/native apps) version of the same payload, in case a site requires information removed by sanitization. Therefore, writing unsanitized data by default (rather than via the unsanitized list) would also make addition of sanitized payloads by browser implementations be web-incompatible, as previously unsanitized content would become sanitized, potentially wiping metadata a site relies on.

In Chromium implementation at least, when clipboard read method is called, we query all the standard formats from the clipboard that are supported by the Browser. This would be very expensive if we have to read all custom formats as well even if the sites haven't requested for any custom formats. unsanitized option gives Browsers the flexibility to decide if they want to read any custom formats along with standard MIME types(such as text/html, text/plain, image/png etc) when web deveopers call navigator.clipboard.read.

Why would the browser need to read both versions without this option?

Well, in clipboard read, we use the ClipboardItem that only takes MIME types as input. How would you know if the site is requesting unsanitized content for HTML format? e.g.

const clipboardItems = await navigator.clipboard.read();
const clipboardItem = clipboardItems[0];
const htmlBlob = await clipboardItem.getType('text/html'); // Should this return sanitized or unsanitized HTML content?

When the user copies something on a website, the website shouldn't be in control of whether a given MIME type should be exposed to another app or not.

Well, async clipboard write method gives complete control to web authors as to what content should be written to the clipboard. This is being achieved by providing the MIME types in ClipboardItem, so I'm not sure what the concern here is exactly. Default copy operation (using execCommand or copy command when user presses ctrl+v) writes all the supported/applicable formats to the clipboard based on the selected content so this process is completely different than async clipboard read/write APIs.

rniwa commented 3 years ago

Again, I'm getting really confused by mixing up reading & writing of system pasteboard content. What problem exactly are we solving by explicitly requesting unsanitized format for reading or writing pasteboard content in a website / webapp. Please define the threat model of each scenario separately, and explain why an explicit request for unsanitized content is required.

If we make the process of reading/writing unsanitized content more explicit, then there is an implicit expectation that the developers are aware of the security implications and would have mitigations in place(e.g use intense fuzzing, such as provided by OSSFuzz). Quoting @dway123 here as I think this response captures lot of details about why we are proposing the unsanitized option:

Requiring sites to provide the unsanitized list allows sites to support both an unsanitized (for reading by the site) and sanitized (for reading by other sites/native apps) version of the same payload, in case a site requires information removed by sanitization. Therefore, writing unsanitized data by default (rather than via the unsanitized list) would also make addition of sanitized payloads by browser implementations be web-incompatible, as previously unsanitized content would become sanitized, potentially wiping metadata a site relies on.

I really don't follow. When we say developers, are we talking about web developers, or native app developers? Surely, browsers should have to write both versions to the system pasteboard because we don't know when or if the user pastes the content to another browser instance of the same origin or to some other native applications. So it doesn't seem like there is an option left for web developers to say, I want to only write a version of content that didn't go through sanitization process.

In Chromium implementation at least, when clipboard read method is called, we query all the standard formats from the clipboard that are supported by the Browser. This would be very expensive if we have to read all custom formats as well even if the sites haven't requested for any custom formats. unsanitized option gives Browsers the flexibility to decide if they want to read any custom formats along with standard MIME types(such as text/html, text/plain, image/png etc) when web deveopers call navigator.clipboard.read.

I'm really confused here. Why would a website want to read the sanitized version of content from the system pasteboard if a version of the content that's unsanitized is available to them? Is the concern that we want to make sure we don't end up giving them potentially dangerous content? That doesn't seem like a kind of assumption websites should be making in the first place. There is nothing browsers can do to ensure that whatever content read from the system pasteboard won't result in some kind of XSS or even remote server exploits since we have no idea how a website is processing it. e.g. a plain text in the pasteboard could result in XSS if it's inserted inside a script or style tag or some attributes.

Why would the browser need to read both versions without this option?

Well, in clipboard read, we use the ClipboardItem that only takes MIME types as input. How would you know if the site is requesting unsanitized content for HTML format? e.g.

const clipboardItems = await navigator.clipboard.read();
const clipboardItem = clipboardItems[0];
const htmlBlob = await clipboardItem.getType('text/html'); // Should this return sanitized or unsanitized HTML content?

I don't see the need for reading the sanitized version. What is the scenario in which a website wants to read unsanitized version of the content?

When the user copies something on a website, the website shouldn't be in control of whether a given MIME type should be exposed to another app or not.

Well, async clipboard write method gives complete control to web authors as to what content should be written to the clipboard. This is being achieved by providing the MIME types in ClipboardItem, so I'm not sure what the concern here is exactly. Default copy operation (using execCommand or copy command when user presses ctrl+v) writes all the supported/applicable formats to the clipboard based on the selected content so this process is completely different than async clipboard read/write APIs.

Right, standard ones. But websites shouldn't be in control of, say, exposing a PSD file unsanitized. Similarly, if a website writes HTML, then the browser needs to provide both sanitized HTML and unsanitized HTML for other browsers and native apps because we don't know at the time of writing to the system pasteboard what the receiver is capable of.

snianu commented 3 years ago

I really don't follow. When we say developers, are we talking about web developers, or native app developers?

In that response I was mainly referring to native app developers, but it applies equally to web devs as well. Web developers could use the Sanitizer APIs to decide what elements/attributes to drop and would have control over what content gets pasted.

Surely, browsers should have to write both versions to the system pasteboard because we don't know when or if the user pastes the content to another browser instance of the same origin or to some other native applications.

I'm guessing you are referring to standard formats here as custom formats are always written by sites/apps if they opt-in to reading/writing custom formats. For standard formats, we want sites and native apps to explicitly opt-in to read the unsanitized version so they are aware of the security implications. Some legacy sites and native apps don't receive frequent updates and are not really designed to properly process unsanitized HTML content from the web. This is why we want to always write sanitized version of standard formats so we don't regress the paste behavior in these sites/apps.

I'm really confused here. Why would a website want to read the sanitized version of content from the system pasteboard if a version of the content that's unsanitized is available to them?

There are sites and legacy apps that don't receive updates often. These sites/apps depend on the standard formats(which is predefined by the OS) being available on the pasteboard. Currently we sanitize the standard formats by-default so we don't want to regress that behavior. Unsanitized version of the standard formats wouldn't be available by-default. It will be written to the pasteboard as custom formats. It is described in this section. The sites/apps have to make changes explicitly to read the unsanitized version of standard formats, and if they don't, then they receive the sanitized version of standard formats.

I don't see the need for reading the sanitized version.

Sanitized version is always needed to support sites/apps that don't want to make any changes to their copy/paste code. Legacy native apps (at least on Windows) don't receive frequent updates so we don't want to regress copy/paste behavior in those apps.

What is the scenario in which a website wants to read unsanitized version of the content?

Excel Online would want to read and process unsanitized version of the HTML format to preserve rich formats like table cells color. Here is a GIF that shows the difference between unsanitized & sanitized version of the HTML formats. Note how the sanitized version loses styles when pasted into Excel online compared to the unsanitized one.

But websites shouldn't be in control of, say, exposing a PSD file unsanitized

I don't think the site needs to specify any particular format. They can just serialize the payload and write it under a custom format that can only be interpreted by either the site or the native apps that are aware of this custom format's content. That way both the site and native app have complete control over the content of the payload. This also addresses some security concerns regarding unsanitized custom formats where only the site and the native app know how to parse the content of the custom format and are also able to trust the content by adding some security tokens or something to uniquely identify the payload present in the custom format.

BoCupp-Microsoft commented 3 years ago

@rniwa if it would help there is a Web Editing Working Group meeting next Friday, 9/10/2021 (details on the GitHub site) where we'd be happy to discuss this issue in greater depth. We can also schedule a separate meeting dedicated to the topic if that would work better. Let us know if you think we should continue the conversation in one of those higher bandwidth environments.

Thanks, Bo

rniwa commented 3 years ago

@BoCupp-Microsoft : I won't be able to attend that meeting since I only work Mon-Wed these days, and I'm barely awake at 11am PDT due to various medications I'm taking. @whsieh can probably attend one of those meetings though.

I'm open to scheduling a specific meeting but offline discussiosn over GitHub might be the fastest way given the very restrictive working hours I have these days.

snianu commented 3 years ago

Tagging folks from FF & Chromium as we are going to discuss this issue in this week's Editing WG meeting. @mkruisselbrink @a-sully @annevk @evilpie

travisleithead commented 3 years ago

September WG meeting: @snianu presented an overview of the pickling design, and showed a demo. Discussion to continue in this issue.

snianu commented 3 years ago

Here is the PPT that we presented today: pickling-api.pdf

BoCupp-Microsoft commented 3 years ago

After @snianu's clipboard pickling presentation last week in the Web Editing WG meeting, we had a discussion that resulted in two action items:

  1. @whsieh from Apple suggested that we write both the sanitized and unsanitized content all the time to the clipboard instead of requiring authors to declare they also want an unsanitized copy under the new pickled name. That seems like a good suggestion so @snianu can incorporate that into the explainer / upcoming pull request.
  2. @whsieh also pointed out that Safari may not want to let web sites read the unsanitized content unless it was produced from the same origin trying to read it. I asked whether it would be acceptable to use a CORS-like approach and let the producer of the clipboard content specify which origins can consume it. Since the motivation for restricting read seemed to be privacy concerns I think it would be OK to the let the producer of the clipboard content declare its OK for sharing. @wshieh is going to give that further consideration and can share more thoughts at a future meeting or in this issue.
snianu commented 3 years ago

Tagging @mkruisselbrink to see if he has any concerns with what is suggested in the first point below:

@whsieh from Apple suggested that we write both the sanitized and unsanitized content all the time to the clipboard instead of requiring authors to declare they also want an unsanitized copy under the new pickled name. That seems like a good suggestion so @snianu can incorporate that into the explainer / upcoming pull request.

If we are just writing the standard well known formats, then this approach sounds good to me. But, for custom pickled formats, we still need a way for the web authors to provide the custom format name. The unsanitized option can be used by the web authors to provide the name of the custom formats during write operation.

For read, I think it is better to have an unsanitized option as reading all the formats (both standard & custom) would be bad for perf reasons. So, in addition to CORS like approach, we should also provide unsanitized option, and both of these options would address security as well as perf concerns.

rniwa commented 3 years ago

Tagging @mkruisselbrink to see if he has any concerns with what is suggested in the first point below:

@whsieh from Apple suggested that we write both the sanitized and unsanitized content all the time to the clipboard instead of requiring authors to declare they also want an unsanitized copy under the new pickled name. That seems like a good suggestion so @snianu can incorporate that into the explainer / upcoming pull request.

If we are just writing the standard well known formats, then this approach sounds good to me. But, for custom pickled formats, we still need a way for the web authors to provide the custom format name. The unsanitized option can be used by the web authors to provide the name of the custom formats during write operation.

What does this mean? MIME type specifies exactly what type a given format is.

For read, I think it is better to have an unsanitized option as reading all the formats (both standard & custom) would be bad for perf reasons. So, in addition to CORS like approach, we should also provide unsanitized option, and both of these options would address security as well as perf concerns.

We object to this proposal.

BoCupp-Microsoft commented 3 years ago

Tagging @mkruisselbrink to see if he has any concerns with what is suggested in the first point below:

@whsieh from Apple suggested that we write both the sanitized and unsanitized content all the time to the clipboard instead of requiring authors to declare they also want an unsanitized copy under the new pickled name. That seems like a good suggestion so @snianu can incorporate that into the explainer / upcoming pull request.

If we are just writing the standard well known formats, then this approach sounds good to me. But, for custom pickled formats, we still need a way for the web authors to provide the custom format name. The unsanitized option can be used by the web authors to provide the name of the custom formats during write operation.

What does this mean? MIME type specifies exactly what type a given format is.

Talked with @snianu offline and we agree that when writing specifying an unsanitized list isn't necessary. Maybe a simple way to put it is:

  1. When writing, everything will be put into the "pickle jar" unchanged from how the author supplied it.
  2. Additionally, for every well-known format written, we will also produce a "sanitized copy" per usual.

@rniwa there's no objection to the first point as I've outlined it is there?

For read, I think it is better to have an unsanitized option as reading all the formats (both standard & custom) would be bad for perf reasons. So, in addition to CORS like approach, we should also provide unsanitized option, and both of these options would address security as well as perf concerns.

We object to this proposal.

@rniwa can you clarify what part you are objecting to? And why? :-)

BoCupp-Microsoft commented 3 years ago

@whsieh I wrote this above as an action item for you. :-) I'm wondering if you can also bring your findings to our next Web Editing WG meeting on 9/24/2021. Thanks!

@whsieh also pointed out that Safari may not want to let web sites read the unsanitized content unless it was produced from the same origin trying to read it. I asked whether it would be acceptable to use a CORS-like approach and let the producer of the clipboard content specify which origins can consume it. Since the motivation for restricting read seemed to be privacy concerns I think it would be OK to the let the producer of the clipboard content declare its OK for sharing. @wshieh is going to give that further consideration and can share more thoughts at a future meeting or in this issue.

BoCupp-Microsoft commented 3 years ago

As a follow-up to this action item from the Web Editing WG meeting from 9/24/2021, members from Apple, Google and Microsoft met last Friday (10/1/2021) to discuss two topics:

  1. The contents of the native clipboard for pickled data exchange
  2. The ability for a web site to read unsanitized HTML via the async clipboard API

For point 1, we concluded that we are going to include the proposed format for the Web Custom Format Map and clipboard format naming conventions as outlined in the current explainer as non-normative notes in the Clipboard API spec. Additional details... we debated for a while whether what will surely become a de facto standard should be included as part of the actual standard using normative text but decided that alternative implementations are possible and that we would leave room for those by using non-normative text. One hypothetical example is that Apple could introduce new platform APIs for the pasteboard that could read and write the proposed pickled format in addition to some legacy pickled formats that vary across browsers today to hide the implementation details from native app authors including browser implementors. As a counterpoint it was argued that to facilitate interchange between native and web apps, the shape of the platform-specific format written to the clipboard must be documented somewhere, and it was better to have it as part of the standard than to require that it be reverse engineered from the apps that happen to implement it first. Our compromise was to include it in the spec using non-normative language.

For point 2, Microsoft pointed out that the ClipboardEvent's getData method already provides unsanitized access to the HTML on the clipboard (in Firefox, IE, Edgehtml-based Edge, Chromium-based Edge and Chrome), but that navigator.clipboard.read only returns a sanitized fragment of the HTML on the clipboard. This loss of fidelity creates feature gaps for Microsoft Office apps (and likely many others) and prevents them from adopting the async clipboard API. The proposal is to provide unsanitized access to HTML on the clipboard by using the navigator.clipboard.read method with a new unsanitized option. As a counterpoint, Apple suggested that native apps can't be trusted to write data to the clipboard without revealing document metadata and that the browser should sanitize it to prevent exposure before allowing it to be read by a website. Microsoft expressed skepticism as to whether it was the browser's responsibility to restrict what native apps could place in their HTML data. We agreed to continue discussing point 2 in our Web Editing WG meeting this Friday (10/8/2021).

css-meeting-bot commented 3 years ago

The Web Editing Working Group just discussed Continue discussion on clipboard APIs.

The full IRC log of that discussion <Travis> Topic: Continue discussion on clipboard APIs
<Travis> github: https://github.com/w3c/clipboard-apis/issues/150
<Travis> BoCupp: Not sure this is the right issue...
<BoCupp> https://github.com/w3c/editing/issues/334
<Travis> .. Ah, in a different repo: https://github.com/w3c/editing/issues/334
<Travis> github: https://github.com/w3c/editing/issues/334
<Travis> BoCupp: we were able to resolve half the discussion. Agreed to add a non-normative note on the format that is used to communicate with native apps (and vice-versa)
<Travis> .. from native->web: current read behavior of nav.clipboard, you get sanitized content.
<Travis> .. for exchange with office apps (or similar) the fidelity is too low (loss of formatting, for example)
<Travis> .. so we want to add an "unsanitized" option to the read API.
<Travis> .. (trying to match ctrl+v)
<whsieh> q+
<Travis> .. we want raw content from the pickle jar (if exist) or from well-known HTML format.
<Travis> .. if you get that raw data, then native->web works great even for apps not yet updated.
<Travis> .. (also want to talk about web->native)
<Travis> .. when writing, there is also sanitization happening today.
<Travis> .. if we can't support well-known HTML format write, then our partners won't be able to support the API because it cuts off existing support already provided by the setData legacy API.
<Travis> .. today in all browser setData is a raw-write for HTML to clipboard. If they lose that in async clipboard it blocks them.
<Travis> .. (it's a downgrade for existing apps already having migrated to async clipboard)
<Travis> ack whsieh
<Travis> whsieh: As discussed there are a few privacy/security issues at play (not fully address from last time)
<Travis> .. in webkit the getData/setData, for these webkit treated as a security fix--was surprising to hear this was only limited to one of them.
<Travis> .. on copy/paste of content to native apps, they can reach into the pickle jar. So this can already work without a sanitized write.
<Travis> .. Without any explicit Api changes
<Travis> .. there are privacy issues native->web copy/paste.
<BoCupp> q?
<Travis> .. e.g., Word does add some things (like filepaths) into the clipboard and could expose directory structure to the web, and would be a non-starter for unsanitized read.
<Travis> BoCupp: Problem: some existing native apps would take a long time to update (they only read from the well-known format today). They can't simultaneously use the new API (when avaiable) and the old one. (They have to pick one.)
<Travis> .. This then creates a blocker until apps updated to read from new pickle jar.
<GameMake_> q+
<Travis> whsieh: So native app side, they would try to read from pickle jar?
<Travis> BoCupp: an existing app--doesn't know about the pickle jar yet.
<Travis> .. today they read and get full fidelity.
<Travis> whsieh: problem is... web pages adopt new API, old version of native apps would get ?
<Travis> .. they can't use both setData AND nav.clipboard.write (it's one or the other)?
<Travis> johanneswilm: What would happen in that case?
<Travis> BoCupp: Maybe last write wins? But code is not written to handle that (one overwrites anything else).
<Travis> .. the writes are basically atomic for a write (and the results that get generated)
<Travis> .. might be able to specify how they work together?
<Travis> .. but want to have unsanitized write using the new API.
<Travis> ack GameMake_
<Travis> GameMake_: I understand that new version is not backward compatible... how is this new version back compatible?
<Travis> .. so how does old version of Word work?
<Travis> .. Given a new API (unsanitized write).. How does the old version of word get this today?
<Travis> BoCupp: Word is coded to read a well-known HTML format. WebApps fills this format today with whatever they want (and it goes through unsanitized).
<Travis> .. using setData API.
<Travis> .. on async write, we started sanitizing content and putting it into the same "slot" that Word reads from.
<whsieh> q+
<Travis> .. am proposing that the write on async clipboard fill the same slot in the same way.
<Travis> GameMake_: So, when on web, when using new API with unsanitized version, then only the unsanitized version is being written?
<Travis> .. thinking that sanitization was to improve security. So, how does this not become a problem.
<whsieh> q-
<Travis> BoCupp: threat was for arbitrary (new) formats (never accessible from the web) before. And Apps wouldn't be prepared and would be vulnerable to this. (The web being able to write to those format names.)
<Travis> .. Those were the threats we were concerned about.
<Travis> .. I don't think apps (like Word) aren't aware of the risk, as they are already using the HTML format.
<whsieh> q+
<Travis> .. I think native apps, getting HTML should be ready.
<Travis> GameMake_: So the new sanitized format flag only work for well-known formats?
<Travis> BoCupp: the behavior of write not only allows arbitrary mime-types to be written (sanitized), pre-existing well-known types to flow through in the way they did in the past.
<Travis> .. am primarily focused on HTML format write now. (But may want to expand to other well-known, existing types.)
<Travis> .. For read, there must be an explicit option to "give me raw".
<Travis> .. (restating) we built a new API, but customers won't move to it (because it's a loss of functionality available today).
<Travis> GameMake_: Didn't know why we chose to open the whole...
<Travis> s/whole/hole
<Travis> BoCupp: Agree that we did too much (sanitizing on write for well-known)
<Travis> GameMake_: Not sure I see the complete problem
<Travis> .. if we were solving for a problem, we can't just ignore that problem...
<johanneswilm> +q
<Travis> .. but if we bringing this back to a prior state, I could be more supportive.
<whsieh> q-
<Travis> BoCupp: In the security process, we fix/patch the holes. They've supposedly been addressed in the past (or continue to be so).
<Travis> .. when the async clipboard write was proposed in the prior group (with garykac's proposal), I opposed the change.
<Travis> .. was there a real threat, we would have tried to solve that. But the change wasn't based on a threat, it was just a suggestion.
<Travis> .. If there was a motivation for the change, I'd really like to know what it was.
<whsieh> q+ to: mention that we really can't expect all native apps to sanitize `script` tags, for instance
<Travis> GameMake_: I definitely want to know what the reason was.
<Travis> q?
<Travis> ack johanneswilm
<Travis> johanneswilm: If Safari was the only one that removed it, then maybe they can tell us what the issue was?
<Travis> whsieh: Don't know of a specific native app that might have been taken by the exploit.
<Travis> .. Just don't want to expect all native apps (moving forward) to be able to do the sanitization steps.
<Travis> .. It's hard to get that right.
<BoCupp> why would native Word need to worry about stripping onmouseover events?
<BoCupp> that's a web app concern
<Travis> .. the Browser is in the middle, and should be responsible for sanitizing.
<Travis> .. is also at odds with the compatibility story.
<BoCupp> q?
<Travis> whsieh: to BoCupp, maybe not native word, but perhaps an electron app?
<Travis> .. there are some corner cases we wouldn't expect them to catch.
<Travis> BoCupp: For electron, they have access to sanitization on read (default) (if they are web-based).
<Travis> whsieh: this requires them to use the web API...but they aren't limited to that given they are native?
<Travis> .. my understand would be the only way to access the data would be through opt-in unsanitized read.
<Travis> BoCupp: special meeting to continue this discussion?
<Travis> .. given we're out of time
<Travis> johanneswilm: I propose we meet again on the 15th.
<Travis> BoCupp: Sounds fine.
<Travis> .. I do want to make progress... if we're running in circles I don't want to waste folks time.
<Travis> action: add a special meeting on October 15th (same time/place)
<Travis> travis: will just cover this topic on the 15th.
<Travis> whsieh: I'm sure we can come to some consensus ;-)
<Travis> Thanks everyone! I think we covered a lot of ground today.
css-meeting-bot commented 3 years ago

The Web Editing Working Group just discussed Seeking feedback on Clipboard Pickling APIs.

The full IRC log of that discussion <Travis> topic: Seeking feedback on Clipboard Pickling APIs
<Travis> github: https://github.com/w3c/editing/issues/334
<tilgovi> bo: right now we have a GitHub issue continuing security review led by some GitHub engineers. The last consensus that we had was that we want to document a format for interchange between native apps and web application. We agreed we would write that in a non-normative note.
<tilgovi> ... we had some disagreement on the sanitization procedures for both read and write
<tilgovi> ... Our proposal has been that when we read, we could supply a new option, "unsanitize", that takes a list of content types to read without sanitization
<tilgovi> ... We maintain that it's unnecessary to sanitize on write to the clipboard. Certainly for clipboard pickling, you can't sanitize a format you don't understand. More importantly, for the well-known format text/html, we want to be able to write it unsanitized. We detailed the impact that it has on applications if we lose the fidelity of writing unsanitized HTML.
<tilgovi> ... We had resolved that we would allow user agents to sanitize or not, at the last special meeting. I don't think we have anything new.
<tilgovi> Anne: It sounds quite bad for web developers, everything that is optional.
<tilgovi> Wenson: it would be nice to reiterate why sanitize is necessary on read.
<tilgovi> Bo: At least with Chromium implementation, if you Ctrl+V, if you don't do anything, we process HTML before we put it into the document. We make "insert ready" HTML.
<tilgovi> Bo: Unlike the legacy API, that returns unsanitized content, Chromium browsers produce an insert ready fragment today. This has a side effect of degrading the fidelity of the HTML that the application can see.
<tilgovi> ... the web application for Word online is not able to change to the new API because it needs style rules inserted by MS Word. In a nutshell, supporting unsanitized is backwards compatibility.
<snianu> Some context on the HTML sanitization issue: https://github.com/w3c/clipboard-apis/issues/150
<tilgovi> Wenson: When we shipped sanitization for clipboard, at lest for cross-origin, all data is sanitized. Including native to web app
<tilgovi> ... We have certain quirks in the algorithms for sanitization, some of them for things like MS Office, to make some workflows work.
<Travis> q?
<tilgovi> Bo: I think it might be hard to specify the quirks we would need.
<johanneswilm> q+
<Travis> ack johanneswilm
<tilgovi> johanneswilm: The thing that you're saying about exceptions is not really accessible to developers making applications. It would be good if this doesn't just happen to work in MS Word.
<tilgovi> Wenson: There are security goals we need to uphold, but workflows we acknowledge will be useful for unsanitized content. For those we want to expose native APIs that allow apps to expose raw content if they want to do so, knowing that it's going to be read by arbitrary web content. That is something we want to support, but I don't think we can do it and uphold security goals at the same time.
<BoCupp> q+
<tilgovi> annevk: I have a question about security goals.
<tilgovi> ... say there is a cross origin exchange, web->web or native->web. Either they could agree to use a pickled type, or they could agree to opt into unsanitized behavior. I'm not sure what the difference is between using a new MIME type or a MIME type with unsanitized.
<tilgovi> BoCupp: the difference is for native apps that are already writing the well-known HTML format, can web apps read that existing HTML. If they both opt into using text/html2, they can. But it should be acceptable for them to read text/html as they exist today.
<tilgovi> ... Native -> web, Wenson railed a concern that native apps might put in comments author metadata that might reveal private information.
<tilgovi> Wenson: there are real world examples of this. MS Word will put a user's file paths within attributes that get copied.
<tilgovi> johanneswilm: one question here. One example I've heard is MS word file paths and this is what all this about. But MS word is also the one application you happen to make exceptions for in the paste rules. But this affects other applications, too, right?
<Travis> q?
<tilgovi> Wenson: that's currently the case, yes.
<Travis> ack BoCupp
<tilgovi> BoCupp: this is just a recap of the special meeting. We have scenarios we need to enable. We differ in behavior today, and we haven't come to consensus except to say this is all optional.
<tilgovi> annevk: there was agreement to make it optional?
<tilgovi> BoCupp: we did agree we could write "browser MAY do this".
<tilgovi> Wenson: that is correct
<tilgovi> annevk: that was instead of an "unsanitized" options bag that some implementation might ignore?
<tilgovi> BoCupp: can someone propose a step for moving forward?
<tilgovi> BoCupp: from our perspective, it's not locked down with the legacy APIs and it's unfortunate that the new APIs don't allow the same support. Our customers can't use it as it's currently authored.
<tilgovi> annevk: I see it as Chromium has a privacy bug in its legacy APIs.
<tilgovi> BoCupp: well, the idea that HTML as a format isn't meant to be shared with the web is strange.
<tilgovi> Anupam: Firefox also returns unsanitized content.
<tilgovi> ... If there was privacy or security concern it's been there for decades.
<snianu> The issue that I linked above shows how FF, Chrome, Edge(old and new) return unsanitized HTML content via DataTransfer APIs.
<tilgovi> rrsagent, bookmark
<RRSAgent> See https://www.w3.org/2021/10/29-editing-irc#T16-32-41