whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.17k stars 2.69k forks source link

Proposal: canvas to blob URL function #5311

Open rhendric opened 4 years ago

rhendric commented 4 years ago

I think there ought to be an API function to go directly from a canvas element to a blob: object URL.

HTMLCanvasElement already offers toBlob and toDataURL, of course. Both of these methods give callers read access to the contents of the canvas. The problem with this is that canvas data can be used to fingerprint the browser, and there exist a few implementations that restrict access to these methods for that reason (Firefox has the privacy.resistFingerprinting option which causes these methods to prompt the user for permission, and there exist a few extensions for Chrome that patch or neuter these methods in various ways to defeat fingerprinting). All of these methods involve a trade-off of functionality or image accuracy for privacy.

But not every web page that calls toBlob/toDataURL actually needs direct access to the contents of the canvas. A page that just wants to use a canvas to generate some image data and then, for example, offer that image for download or use it as the source for img elements or CSS backgrounds, poses no fingerprinting risk. But such a page has no way of asking for an opaque URL to use for such a purpose; it can only call one of the above methods to achieve this, which can trigger a confusing user interaction (Firefox) or fail (Chrome extension).

Providing some API for this—for example, extending HTMLCanvasElement with a USVString toObjectURL(optional DOMString type = "image/png", optional any quality) method or similar—would give web pages the ability to signal to the browser that they have no need to actually inspect the contents of the blob produced by the canvas, and give browsers or browser extensions the ability to provide opaque handles to canvas data in such circumstances without violating the user's intent not to be fingerprinted. Such a method should have comparable semantics to toBlob/toDataURL: the returned URL would reference a blob that contains a static snapshot of the canvas state at the time the method was called, encoded according to the provided arguments.

The API proposed doesn't need to be synchronous; a method that returns a promise or uses a callback like toBlob does would also suffice. Those alternatives have the advantage that they would be polyfillable as this.toBlob(blob => callback(URL.createObjectURL(blob)), type, quality) or similar. They have the disadvantage of presenting the dilemma of, in the first case, adding an API that is inconsistent with toBlob's callback style; or, in the second case, adding a new asynchronous API that doesn't use promises.

Another alternative would be to extend the URL.createObjectURL FileAPI function to accept HTMLCanvasElements, as it already is overloaded to accept objects implementing either Blob or MediaSource. This would require the implementation to be synchronous, would be less discoverable, and would not expose the type and quality parameters, all of which seem like disadvantages to me; but if that ended up being the approach preferred between W3C and WHATWG, I would have no problem with it.

Finally, it's worth noting that, for this API to be effective in maintaining the fingerprint resistance enforced by browser or extension, the blob: URLs it produces must indeed be opaque. This means in particular that implementers of fingerprint-resisting restrictions may need to do additional work to prevent such URLs from being fetchable.

annevk commented 4 years ago

As I noted in the bug it's not clear to me how robust the fingerprinting defense itself is so I'd prefer to not design on top of it until that's more clear.

rhendric commented 4 years ago

Is the debate about canvas fingerprinting resistance happening somewhere specific, or is this your synthesis from ad-hoc conversations? I want to engage further but I wouldn't want to waste our time with arguments that have been considered elsewhere, or fragment an existing conversation.

domenic commented 4 years ago

I don't understand this proposal. Is the idea to invent some new type of blob: URL, the opaque blob: URL, which cannot be read from? Otherwise this does not seem to add anything.

rhendric commented 4 years ago

No, blob: URLs are already opaque, in that you need to use some kind of API to get actual data out of them. Implementations then have the de facto option of restricting that API (Firefox could choose to prevent fetching blobs created from canvas data when the privacy.resistFingerprinting preference is set; extensions that already monkey-patch toBlob and toDataURL could monkey-patch the fetch global to do the same thing). But even with such restrictions, the blob: URL is still useful in contexts like the src attribute of an img. Contrast with a data: URL, where the URL is the data and once given out, it can't possibly be restricted.

To be clear, standardizing how fingerprinting is resisted is not the immediate goal here. I only want to standardize a toObjectURL method or equivalent so that web authors can use canvas data in ways that don't expose fingerprinting risk, and implementers can continue to experiment with anti-fingerprinting techniques, without the two groups in tension with each other. I am not proposing, at this time, standardizing anything about how implementers may or may not restrict access to certain blob: URLs.

domenic commented 4 years ago

Implementations don't have an option to restrict the API per spec. They could make up new concepts, like blob URLs which are restricted, but that's its own new proposal.

To be clear, I'm saying that without new proposals that modify how blob URLs work, this thread is equivalent to asking for a shortcut for canvas.toBlob(blob => URL.createObjectURL(blob)), which is not very compelling. The real meat of this proposal seems to be asking for some new type of blob URL which doesn't behave like all existing blob URLs, e.g. the ones you get via that snippet.

If you are not in fact proposing a new type of blob URL at this time, then you should be happy with canvas.toBlob(blob => URL.createObjectURL(blob)), as it has the same properties.

rhendric commented 4 years ago

Sorry, I'm new to this process. I thought the thing to do would be to propose the smallest possible change to the standard that would enable implementations and web authors to cooperate. But I take your point that, per spec, having a blob and having its object URL are effectively equivalent.

Should I also propose that blobs gain an internal boolean attribute called invisibleToPage or something, the value of which is implementation-dependent, and that the scheme fetch procedure for blob: URLs be amended to include a step that throws a network error if invisibleToPage is true? That would give the spec a reason to differentiate between having a blob: URL and having the blob itself, while still giving implementations the freedom to (continue to) experiment with how fingerprint resistance should actually work.

domenic commented 4 years ago

I wouldn't go as far as to say you "should" propose that. In fact I think it's better to start at the earlier steps of of https://whatwg.org/faq#adding-new-features.

rhendric commented 4 years ago

Sure, that's fair. I think I'm mostly going to be repeating statements I already made, but I'll give it a shot.

There are basically two use cases involved here, each of which are effectively already solved individually (in the real world, if not in the spec), but not when taken together:

  1. Some users would like to prevent web sites from uniquely identifying their browser/computer via, among other methods, implementation-specific behaviors of canvas drawing functions.

    • Requirements:
    • Must be able to prevent some/all web pages from reading the exact contents of canvases.
    • Desirable:
    • Some/all web pages should not be able to get any new information out of a canvas—in other words, the API-observable behavior of canvas methods on any standard-compliant browser on any computer under any circumstances should be identical if the web page provides the same inputs to the canvas. (This doesn't mean the user-observable behavior of etc. must be identical.)
    • Open design questions, or things to possibly leave up to implementations:
    • How does the user determine which web pages need to be restricted in this way? (A single global setting, a per-domain setting, something else?)
  2. Some web page authors would like to write code that uses canvases to generate images, and then use those images in the web page in ways that do not require code running in the context of the page to have direct access to the content of the images.

    • Requirements:
    • Must work even when the web page is restricted in the sense of use case 1.
    • Must work with a static snapshot of the canvas, not a live view of its contents.
    • Desirable:
    • Should support using the contents of a canvas as:
      • the src of an img element
      • the background-image CSS property of an element that supports backgrounds
      • the href of a rel="shortcut icon" link element
      • the href of an a element, for downloading the image
      • the argument of window.open, for opening the image in a new window
    • Should support using the contents of a canvas in any other context where the URL of an image is expected, provided the browser doesn't expose the contents of that image via an API that would violate use case 1.

Use case 1 has solutions in the wild in browser implementations (Firefox/Tor Browser) and in web extensions for Chrome (though I think neither solution would be considered strictly standards-compliant). Use case 2, ignoring all of its references to use case 1, is already (mostly?) supported by modern browsers via blob: and data: URLs, but only if none of the existing solutions for use case 1 are in effect. The problem I'd like to solve is support for both cases 1 and 2 simultaneously.

How is that for a problem description?

domenic commented 4 years ago

Thanks, that's really helpful.

I'm curious about how (1) is possibly achievable. With Spectre you can read the memory of anything in your process. So solving (1) seems to require some proposal for moving canvases into a separate process, which is a pretty big ask.

rhendric commented 4 years ago

I think that's the point @annevk has been making.

I could edit the requirement for (1) to say ‘reading the exact contents of canvases through web APIs’, which is what I meant to say implicitly. That makes (1) achievable, but then I assume your question becomes whether the amended (1) is worth doing if there are ways to read the exact contents of canvases via Spectre.

I'm not an expert on Spectre, and maybe the ease of using Spectre as an attack vector is such that it makes no sense to attempt amended (1) at all. But I'm not sure that it is all that easy, or that there won't be future developments in hardware, firmware, operating systems, or browsers that make it less easy. (I will happily read links to security experts making the case for or against this.) Defense in depth and all that; even if you know some of your adversaries have explosives, there's still an argument for locking your front door.

domenic commented 4 years ago

I'm not an expert on Spectre, and maybe the ease of using Spectre as an attack vector is such that it makes no sense to attempt amended (1) at all.

That is my understanding of the current security consensus, although I regret not having links off-hand to give you.

domenic commented 4 years ago

Well, I guess I can point you at https://chromium.googlesource.com/chromium/src/+/master/docs/security/side-channel-threat-model.md. In particular the conclusion:

For the reasons above, we now assume any active code can read any data in the same address space. The plan going forward must be to keep sensitive cross-origin data out of address spaces that run untrustworthy code, rather than relying on in-process checks.

Given this I can confidently say that Chromium is not interested in implementing APIs based on your amended (1), as it directly contradicts our "plan going forward".