Mitigate canvas fingerprinting risk

pes10k commented 4 years ago

Small differences in canvas operations is a high fidelity fingerprinting vector, demonstrated by several papers [1] and popularly deployed tools [2]. Some browsers are already shipping mitigations (firefox has non-default fingerprint resist mode, brave adds small amounts of random noise to canvas serialization, etc) to address the fingerprint surface, but the spec should include a solution to the problem, to prevent interop problems, set consistent dev expectations, address common-action issues in solving privacy problems, etc.

Of particular importance to the spec, it'd be ideal to have a solution that didn't require privacy sensitive users to disable canvas all together (e.g. Tor Browser Bundle, etc.)

This issue isn't to request a specific solution to the problem, but to track conversation for getting a solution into the spec.

Previously discussed solutions (without endorsement) include:

permission-ing access
requiring a user gesture / activation for methods that allow for canvas readback
adding randomization to the output of the canvas (the original idea for this approach, AFAIK, comes from [3], brave's shipping approach is a bit different [4])
preventing readback of non-visible canvases
preventing readback in 3p frames

1: for example, Mowery, Keaton, and Hovav Shacham. "Pixel perfect: Fingerprinting canvas in HTML5." Proceedings of W2SP (2012): 1-12. 2: for example, https://github.com/Valve/fingerprintjs2/blob/master/fingerprint2.js#L903 3: Laperdrix, Pierre, Benoit Baudry, and Vikas Mishra. "FPRandom: Randomizing core browser objects to break advanced device fingerprinting techniques." International Symposium on Engineering Secure Software and Systems. Springer, Cham, 2017. 4: https://brave.com/whats-brave-done-for-my-privacy-lately-episode3/

annevk commented 4 years ago

cc @whatwg/canvas

jyasskin commented 4 years ago

It would be great to get a specification of what Brave is doing to randomize canvas readback. https://brave.com/whats-brave-done-for-my-privacy-lately-episode3/ and https://github.com/brave/brave-browser/wiki/Fingerprinting-Protections are not precise enough to implement from, or for academics to publish attacks on.

kdashg commented 4 years ago

Injecting randomization into readback data is only useful for canvas2d, and even then I think it's not too hard to break with clever compositing and blending. WebGL is a whole different beast, and is extremely prone to leaking values based on execution time/speed. There's no way I see to allow arbitrary shaders without leaking bits.

My recommendation for privacy-critical applications (Tor) is software backends for canvas, both 2d and WebGL.

I do have good news about WebGL fingerprinting though: Based on a small survey I did a while back, the rasterization fingerprinting results (which is what people are mostly worried about) showed that the only bits leaked were the vendor of the GPU, but not which GPU you had. (Nor your OS!) I need more data there, this was promising news, though standard practice for WebGL is still to offer the unmasked RENDERER "which GPU is this" string, as well as the various optional limits and extensions.

We're still designing our approach for canvas fingerprinting in Firefox, but we do expect to start second-classing 3rd-party canvases. Canvas is used for intermediary content construction and compute though, so non-visibility of (especially WebGL) canvases is probably not a case we'll distinguish.

I think we should be taking a serious look at a permission gate for things like "what GPU is this exactly". That's super useful info in some cases, but probably not something you care to let a random blog know.

On Thu, Mar 19, 2020, 9:24 AM Jeffrey Yasskin notifications@github.com wrote:

It would be great to get a specification of what Brave is doing to randomize canvas readback. https://brave.com/whats-brave-done-for-my-privacy-lately-episode3/ and https://github.com/brave/brave-browser/wiki/Fingerprinting-Protections are not precise enough to implement from, or for academics to publish attacks on.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/whatwg/html/issues/5373#issuecomment-601275757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALHJDM4CSMHLI23PAPSBRLRIJBMHANCNFSM4LO5UM7A .

pes10k commented 4 years ago

@jyasskin

Sure, here is the version brave has settled on (slightly different than what's in nightly now, but what will be in nightly shortly).

Generate a random per browser session seed
Mix that seed (HMAC256) with each eTLD+1 to generate a per eTLD+1 seed
Derive from the eTLD+1 seed and the canvas contents a "farble" [1] mask (a sparse matrix of bits to XOR into the canvas, deriving the offsets and the channels from the seed), or reuse an existing non-dirtied one, anytime the canvas’s getImageData, toDataUrl, toBlob or OffscreenCanvas.convertToBlob are called. This farbled mask is deterministic but unpredictable, given the contents of the canvas.
Use this mask to farble the output of the above APIs in a consistent manner.
Anytime a drawing operation is conducted on a canvas, dirty the mask.
All frames in the page share the top level frame's eTLD+1 seed

The goals of the above approach are the following properties:

canvas reads will change across sessions, across sites, preventing canvas from being used as a session linking fingerprint
require cross eTLD+1 collaboration to detect randomization is going on (e.g. if the site can see there is randomization going on, they already have to have tracking capability)
prevent any site from getting different values / fingerprints of the same canvas content by injecting different eTLD+1 frames in the page
target the APIs most useful to finger-printers; the serialization APIs.

Its possible that at some margin, we'll need to extend the above approach to paint-time, instead of serialization time, if we see fingerprinters move to other canvas-querying APIs, but:

this approach seems compatible to extend in that direction
requiring fingerpritners to know before hand the parts of the canvas that will be different across users already makes the fingerprint's job much more difficult, and so even just the above is a significant improvement.

More broadly, I don't mean this issue to be tied to, or specifically suggest, Brave's approach. We're confident in it, but its not impossible there are flaws (and they're are doubtlessly possible improvements); I just mean it to demonstrate the feasibility of the randomization approach, without requiring defining the rendering libraries, or permissions, or similar.

@jdashg

My recommendation for privacy-critical applications (Tor) is software backends for canvas, both 2d and WebGL.

I think this is fine for Tor and folks in that category, but I dont think it should dissuade us for looking for privacy improvements for more "common case" users.

I think we should be taking a serious look at a permission gate for things like "what GPU is this exactly"

Strong agreement there. If there is an issue to remove / restrict the ability of script to access the WebGL debugging extensions (or at least the most identifying ones) it think that would be terrific, and Brave is already moving in that direction (measuring now to see what the web combat cost might be). But I think that's a different issue than the fingerprinting-through-drawing-differences issue here.

1: I don't know where the term farble came from, but we've picked it up to mean "mildly, unpredictably perturb"

pes10k commented 4 years ago

(fwiw, the difference from the above and whats in brave nightly is that the current nightly implementation doesn't mix the canvas contents with the seed when determining the mask, and so allows for recovering the mask by farbling two different identically sized canvases. This was a dumb mistake, pointed out by @othermaciej , and was the result of us not thinking of the farbling as a standard mask / one-time-pad operation.)

whatwg / html

Mitigate canvas fingerprinting risk #5373