w3c / editing

Specs and explainers maintained by the editing task force
http://w3c.github.io/editing/
Other
192 stars 40 forks source link

Delayed clipboard rendering TAG feedback #459

Open martinthomson opened 8 months ago

martinthomson commented 8 months ago

@plinss, @hober, and I discussed the feature in pursuance of our design review issue. This seems fairly reasonable on the surface, but https://github.com/w3c/editing/issues/439 is particularly concerning.

It seems like there is an inherent privacy issue here in that the target application reveals something about itself when pasted into. The suggestions that have been made in the issue, which involve resolving the clipboard items on a timer, would seem to undermine the key advantage of the deferral and do not provide protection within the timer.

The key thing to realize is that the clipboard is a communication medium between the copied content and the application that receives a paste. This is unavoidable given our current model where a website can put whatever it pleases on the clipboard (with the only condition being that they receive an interaction of some sort). That communication is currently one-way and websites don't necessarily get to know about the destination or control it. That makes the channel less effective as a means of learning things about people. The delayed rendering creates a potential bidirectional channel, with the destination choosing from a multiple choice selection. The site learns what choice is made. The choice itself carries novel private information. The timing of the paste is also revealed.

There are other approaches:

  1. Render all formats when paste occurs. This still reveals timing, but would not reveal the choice of format.
    1. A variant of this would be to generate the requested format and a randomized subset of the other formats (differential privacy)
  2. At copy time, produce a single format from which all others can be produced. Let the destination application perform format translation.
    1. As an option, if the single format is not supported at the destination, sites could provide a worklet that can perform translation into different formats. This worklet would be denied access to any communication, so the source site would not have any means of learning the choice of format.
  3. Finally, we could define a new media type that carries a URL. That URL, when resolved, provides the destination application with information in whatever form it desires.
martinthomson commented 8 months ago

Someone indicated that I should expand a little on option 3, so here's an attempt at that.

The whole point of delayed rendering is to add a level of indirection. Rather than getting the content on the clipboard in a given format, you are instead given a promise for that content, or a token that you can exchange in return for that content.

We have a standard form for that on the Web, which is a URL. The clipboard item could be provided in the form of a URL. Applications that support this would be able to obtain content in whatever form they prefer, using fetch and HTTP content negotiation (if they are on the Web) or whatever HTTP library they prefer otherwise.

The key thing this provides is a lack of transparent backwards compatibility. The destination application would need to support this new means of obtaining clipboard data. Now, usually, that would be a drawback in the sense that you can't just continue to use Word Perfect (to use the example @hober chose) and get this delayed rendering. But that's what provides the privacy gain.


A brief side note about this stuff.

What is not clear about Web clipboard APIs is that they have this wonderful covert channel in them. A site can populate the clipboard with totally usable text/html or text/plain content, but also sneaky/af content. Most applications will read and use the common format, but a malicious app can harvest the extra information in sneaky/af formatted clipboard content. That could include any amount of surprising information.

You thought that you just copied and pasted a quote from a news article? Nah, you just made a copy of your entire user profile.

The main thing stopping this from being completely terrible is that a site has very little influence over where you ultimately paste it. So while there is a one-way communications medium here that has these wonderful covert channels, it is still largely under user control.


Delayed rendering creates a two-way medium. Admittedly, the timing and format choice convey very little in the reverse direction, but it's there.

Exploiting the covert channel to carry a new overt callback provides better accountability. It also means that the destination has to actively opt in to sending the signal. That protects most existing applications -- and users of those applications -- by default.

That this requires an online connection to resolve is something that could be iterated on. Obviously, same-host communication is what we really want, so it might be possible to supply alternatives that don't require a long trip to a server, as an optimization. That is assuming that you don't want to use the server to set up a local WebRTC connection for you, which is probably more work than is justified. That said, a server can provide content from a web page that has gone away, so there is that.

snianu commented 8 months ago

Thank you for the detailed feedback and suggestions for the privacy issue! Some initial comments:

This is unavoidable given our current model where a website can put whatever it pleases on the clipboard (with the only condition being that they receive an interaction of some sort)

In Chromium browsers, clipboard write permission is required to write data to the clipboard unless it's inside a copy event handler. This is a trust signal from the user that they want to give the site access to the clipboard. According to this model, if a trusted site that has access to the clipboard writes bogus/malicious content to the clipboard, then why would the user continue to use this site for copying content?

If we focus our discussion on just the malicious sites that somehow got the trust signal from the user, then our proposal is to mitigate the damage by allowing just one web custom format to be delayed rendered so the site can't cast a wider net to track user paste activity.

The delayed rendering creates a potential bidirectional channel, with the destination choosing from a multiple choice selection. The site learns what choice is made. The choice itself carries novel private information. The timing of the paste is also revealed.

If the choice is a standard format (text/html, image/png, text/plain), then it doesn't leak any information about the source app where the content was copied from. The timing of paste is at best a guess as the entire process is asynchronous (both in the OS kernel and in the browser).

Render all formats when paste occurs.

This defeats the purpose of delayed rendering as sites don't want to do the work to generate data for a format that may never be used during paste. e.g. See Adobe's use case.

At copy time, produce a single format from which all others can be produced. Let the destination application perform format translation.

IIUC, this means the source apps have to produce the data for all supported formats? Again, that defeats the purpose of delayed rendering.

Finally, we could define a new media type that carries a URL. That URL, when resolved, provides the destination application with information in whatever form it desires.

This solution sounds like something that is outside of the control of the OS system clipboard and introduces additional security concerns in platforms where there are certain expectations from the OS about how delayed rendering should work for copy-paste. For more information on delayed rendering on Windows, please see https://learn.microsoft.com/en-us/windows/win32/dataxchg/clipboard-operations#delayed-rendering-guidance

The whole point of delayed rendering is to add a level of indirection.

This is incorrect. Point of delayed rendering is to not generate data for an expensive format that is never used during paste. For more information on delayed rendering on Windows, please see https://learn.microsoft.com/en-us/windows/win32/dataxchg/clipboard-operations#delayed-rendering

snianu commented 8 months ago

In today's EditingWG meeting we decided to only implement delayed clipboard rendering support for built-in formats (e.g. text/html, text/plain, image/png etc). This privacy concern is not applicable to these formats as it's not tied to any app ecosystem. We have consensus from all major browsers and have resolved on this in the EditingWG. Please let us know if you have any concerns. Thanks!