[CSS-COLOR-4] Security/Privacy: Incognito mode

jsalowey commented 4 years ago

I've been assigned to security review this document. There are some potential fingerprinting issues with respect to system-colors and color-profiles. Should there be specific recommendations for handling these in incognito mode?

svgeesus commented 4 years ago

We are aware of fingerprinting issues with system colors. Could you outline the fingerprinting issue with color profiles?

jsalowey commented 4 years ago

Correction, color-profiles is more of a tracking risk than fingerprinting.

I think the fingerprinting risk from color profiles is low, but it may depend upon the implementation behavior. For example, if some implementations build-in some profiles then perhaps there would be a way to determine what the built-in profiles are (perhaps through timing). I think this risk is low, but would need more privacy review.

tabatkins commented 4 years ago

That seems to be a generic "what properties/values/etc are supported by this browser" info leak, right? That's completely unavoidable, and would be similarly exposed by any other newish feature with varying support, across the entire web platform.

Or is there something specific to color-profiles you're seeing here?

mallory commented 4 years ago

I also reviewed this from a privacy perspective, and I think a reliance on color-profiles reduces the issue with system colors. For example if system colors is overly uniquely specified, then fingerprinting and security attacks increase. But if users are encouraged to choose from a standard set of color-profiles to fulfill their needs, eg accessibility, dark mode, etc, then they're less likely to come up with entirely unique settings.

tabatkins commented 4 years ago

Color profiles and system colors are completely unrelated, unfortunately. Color profiles are about specifying a colorspace, so you can say color(foo 1 .5 0) and the browser knows how to turn that into a color it can display. It's not a set of colors forming a "profile".

x-Jake-x commented 4 years ago

Here are my thoughts:

As far as color-profiles mentioned here go, they allow for an src. In an "incognito mode", to avoid fingerprinting (I do not completely understand how this is different from tracking), a UA should be sure to request these resources newly with each session rather than rely on a cached resource that could potentially identify a user. As I assume that already happens in icognito mode, I wonder if it even needs to be said, but there it is.

i.e., the draft gives this example:

@color-profile --fogra55beta { src: url('https://example.org/2020_13.003_FOGRA55beta_CL_Profile.icc'); } .dark_skin { background-color: color(--fogra55beta 0.183596 0.464444 0.461729 0.612490 0.156903 0.000000 0.000000); }

It is entirely possible that example.org/whatever.icc is actually a resource on a remote server that is generated dynamically when this request is made. If a user is expecting privacy, then it seems to me that the correction action would be to request the resource once for each session, which would make the UA say "This user hasn't been here before."

Just going to assume webpages here to make the rest of these thoughts easier, but... If a dynamic page is loading and the remote server detects that no request was made for the resource, it would be easy enough for it to say "This resources wasn't requested. Perhaps this person has been here before. Let me run this detection algorithm..."

Now, how does this fit in specifically to color profiles rather than this resource being like any other in incognito mode?

color() = color( [ [<ident> | <dashed-ident>]? [ <number-percentage>+ | <string> ] [ / <alpha-value> ]? ]# , <color>? )

We know from the color-profile draft, color() also accepts a string as part of the arguments to reference a color. Specifically saying:

or a <string> giving the name of a color defined by the colorspace.

Assuming that the issue here is indeed privacy and/or security, --fogra55beta as defined by the dynamically generated remote resource ICC profile above could include a series of color keywords and one could reference those keywords on a page. Let's say that it defines the color dark-skin as a keyword in its colorspace definition and that it maps to the aforementioned example values. However, being a dynamically generated resource, one person may have dark-skin mapped to the equivalent of

color(--fogra55beta 0.183596 0.464444 0.461729 0.612490 0.156903 0.000000 0.000000);

in the colorspace, and another may have it mapped to the equivalent of

color(--fogra55beta 0.183597 0.464444 0.461729 0.612490 0.156903 0.000000 0.000000);

(without defining each color specifically in CSS). These differences may not necessarily be visible to the naked eye as an "easy alert".

Now, when the UA goes to display the webpage, if for any reason it uses a cached version of the ICC profile, the page being displayed could technically detect the actual displayed color for an element and map that color to a previous visitor.

For very large numbers of users, this isn't necessarily a huge issue (although it could be combined with a pooled IP to generate larger variations on the detected differences in a geographic location, at which point it becomes a bigger issue...), but for a smaller user base, this tagging could be easily abused.

I don't have a solution for this outside of the use of non-caching in incognito mode. ICC profiles can sometimes be large, from what I hear.

Outside of incognito mode, perhaps vendors could only allow profiles loaded from known good resources (color.org? printer vendors? monitor vendors? seems like a big list to maintain...) and warn a user that an unverified resource request is being made, offering to accept all future requests from that resource.

Or perhaps we could disallow the use of <string> outside of predefined colorspaces, as it appears to me that all other parts of color() are constant or calculated.

On a related note, rather than using a different <string> in a malicious cached ICC definition, the colors in that definition could contain subtle-enough differences that if used on a device that where they are out-of-gamut, the gamut-mapping of the device could map the out-of-gamut colors to different subtle colors in the UA, which could then be read by a page as an identifier, so I guess removing <string> wouldn't be a real solution in that case...

Since IP mapping isn't one-to-one, it does seem to me that this provides a security or privacy concern as it is possibly an additional way to track a user with a finer level of detail from a pool of users.

svgeesus commented 4 years ago

Tab is correct, system colors are likely to resolve to sRGB colors set by the browser or OS (or, rarely, by the user).

Now, when the UA goes to display the webpage, if for any reason it uses a cached version of the ICC profile, the page being displayed could technically detect the actual displayed color for an element and map that color to a previous visitor.

the gamut-mapping of the device could map the out-of-gamut colors to different subtle colors in the UA, which could then be read by a page as an identifier

How would you go about doing that, in script? The computed value, read back from the CSS OM, would be dark-skin in both cases. Could you give an example of cross-site scripting that can read a color from the screen?

x-Jake-x commented 4 years ago

The concern I described isn't from cross-site scripting, but the site itself loading a malicious color profile. In order to read the color from the screen after using a color keyword, something like the following could be employed: https://jsfiddle.net/fcn9jk3z/

However, the draft specifies using a string giving a color name defined by the color space. The only type of color profile loading that is described is loading an ICC profile, so I assume (possibly incorrectly) at the moment that is the only type of color profile that can be loaded. I do not know what other profiles exist besides this format or how those profiles would describe color names using strings.

I have tried to research, and I could not find any data relating to using color keywords from an ICC profile, or whether an ICC profile is even capable of supporting such keywords. I suppose that particular hypothetical vector must be moot in this case, but I'm glad it was at least explored.

The out-of-gamut mapping issue for a remote resource ICC profile is still a possibility, but I suspect that browsers as user agents will only end up mapping to rgb() (per the example in the jsfiddle), so that type of fingerprinting/profiling is also limited in how many people it could track. I won't go so far as to say that it is completely out of the question though.

tabatkins commented 4 years ago

The concern I described isn't from cross-site scripting, but the site itself loading a malicious color profile. In order to read the color from the screen after using a color keyword, something like the following could be employed:

I believe you're describing a persistent-identifier attack, smuggled via the browser's cache for the referenced color-profile file, right? Deliver a detectably-unique ICC file to each user, then later check the results to see if it's a previously-detected user.

Given that this depends on a malicious script and ICC file, tho, how is this different from just sending a unique script file with a user identifier in it? Cache-clearing should wipe out both of these anyway, right?

svgeesus commented 4 years ago

The only type of color profile loading that is described is loading an ICC profile, so I assume (possibly incorrectly) at the moment that is the only type of color profile that can be loaded. I do not know what other profiles exist besides this format or how those profiles would describe color names using strings.

ICC profiles are the only ones in common and current use. The CSS Color 4 specification is purposefully slightly vague on that in case some other replacement color profile format becomes popular in the future. All tests will use ICC profiles though, and these are the only ones supported in current browsers or image editors.

svgeesus commented 4 years ago

In order to read the color from the screen after using a color keyword, something like the following could be employed: https://jsfiddle.net/fcn9jk3z/

Thanks for the example. That doesn't read the color from the screen. It reads the computed value from the DOM. CSS has the concepts of [specified](), [computed]() and [used]() values.

CSS Color 4 does not yet define the computed value for the color() function. However, the computed value for earlier syntactic forms is defined, and I expect the value for the newer forms to be similar. For example, the computed value for the color #17F (#1177FF) is the string rgb(17, 119, 255). So I would expect the computed value of color(prophoto-rgb 0.4835 0.9167 0.2188) to be the string color(prophoto-rgb 0.4835 0.9167 0.2188) and the computed value of color(--mynamed "Deep Pink") to be color(--mynamed "Deep Pink"), not the Lab values read from the Deep Pink entry in the color profile.

x-Jake-x commented 4 years ago

how is this different from just sending a unique script file with a user identifier in it? Cache-clearing should wipe out both of these anyway, right?

I think a few things here:

1) Cache-clearing should wipe both of those out, but since this is specific to incognito mode, I wonder if it needs to be said explicitly that color profiles loaded in CSS need to be wiped (or loaded fresh) during the session. Just to say that we reviewed it and recognize it. Maybe not though if this is generalized behavior.

2) General OS/browser/anti-malware things ("protection services"?) look for malignant scripts these days if they are configured to do so, but I doubt that a single one of them would understand a malignant color profile. Like you said though, since it is as simple as an identifier, the protection service wouldn't be able to recognize such a thing as malignant anyway. That's why I mentioned using a trusted source for color profiles, but...it'd be terribly autocratic to mandate such a source, so...perhaps that's moot.

3) see below

It reads the computed value from the DOM.

Thank you for the clarification! As you can see from the (to rephrase) "earlier defined syntactic form", the browser provides a way to get the computed color specifically for the reserved color names. To make an assumption, one might want to read the computed value of a colorspace color in order to manipulate it in some mathematically relevant way. Thinking that the browser wouldn't provide a way to get the mathematical value of "Deep Pink" in the future seems a bit limiting to me, as it would not be easily manipulable.

After reading these, however:

https://blog.mozilla.org/security/2010/03/31/plugging-the-css-history-leak/ and https://github.com/w3c/csswg-drafts/issues/3847 (references fingerprinting)

It is obvious to me that this situation is not unique in its security concerns.

The draft says that:

color() values: The computed and used value is the color in the specified colorspace, paired with the specified alpha channel (defaulting to opaque if unspecified).

Considering that hsl resolves to rgb: https://jsfiddle.net/fbwoet3v/

And that the draft also says that:

system colors: Each keyword computes to itself. Its used value is the corresponding color in its color space.

In order to achieve what you are describing @svgeesus, I would think that to maintain consistency in its definitions, the draft would have to be changed to read something like:

color() values: Each colorspace color() value computes to itself. Its used value is the color in the specified colorspace, paired with the specified alpha channel (defaulting to opaque if unspecified). Its actual value is the gamut-mapped representation appropriate for the output medium upon which it is used.

But considering that the predefined colorspaces currently compute to rgb, this would necessarily result in treating custom colorspaces differently than predefined colorspaces as far as their compute values in order to prevent the type of fingerprinting and/or tracking mentioned previously.

tabatkins commented 4 years ago

Note that reading out a computed color in some colorspace has nothing to do with CSS's :visited history leak. That's a completely unrelated (and still quite annoying, sigh) problem.

Also, we're certain to grow Houdini APIs to let you convert between colorspaces, and I don't see a reason a priori why we wouldn't want to let custom colorspaces work in that as well.

Again, this is nothing more than a persistent-identifier-via-caching attack, right? Is it anticipated that we will, in general, require substantial mitigations to ensure that these can't be observed (when possible), or are we just relying on the "cache gets cleared, we're cool" defense? So long as cache-clearing does wipe this out, this seems to offer zero new attack surface above a cached script or stylesheet.

If that's all we're worried about, then I don't think this needs further discussion; the referenced file is persisted in your browser cache per standard resource caching rules, and is cleared in the same way.

svgeesus commented 4 years ago

If that's all we're worried about, then I don't think this needs further discussion; the referenced file is persisted in your browser cache per standard resource caching rules, and is cleared in the same way.

I agree with Tab. If the cacheing behavior is mentioned at all, then it should be at the definition of the url() function, so it is clear that it applies to everything in CSS which uses that.

x-Jake-x commented 4 years ago

I agree -- it should be mentioned in URL specifically to address this.

I believe that the computed color fingerprinting/tracking is still an issue in that case, but not in incognito mode. I don't think we need further discussion as long as we don't forget to address url().

svgeesus commented 4 years ago

Its actual value is the gamut-mapped representation appropriate for the output medium upon which it is used.

I don't believe we do that for other colors. For example, if an sRGB color is displayed on a P3 monitor, the used vale is still the sRGB values not the ones in the monitor colorspace. To do otherwise would be a privacy risk as it reveals details of the precise monitor calibration which are not otherwise observable.

svgeesus commented 4 years ago

@x-Jake-x wrote:

I don't think we need further discussion as long as we don't forget to address url().

Issue raised

svgeesus commented 4 years ago

@tabatkins any further thoughts on the used vale of color()? Used values also impact currentColor, right?

tabatkins commented 4 years ago

I don't believe we do that for other colors.

Note that they said "actual value", which is the final value stage and will forever be hidden from pages; "actual value" also deals with things like subpixel rounding, which would offer similar privacy issues if they were exposed.

@tabatkins any further thoughts on the used vale of color()? Used values also impact currentColor, right?

I agree with you that both the computed and used values of a color() function should just be the input value; color(--foo "deeppink") should stay in that form in both computed and used values. We will be offering color conversion tools in Houdini, which I expect will let you get a color like that converted into whatever other space you want, so it's not like the information will be hidden from the page; again tho, that's not a privacy concern beyond the cached-identifier thing, as it's just reflecting the contents of a file the page already included. It will not contain any information about the actual output-device gamut.

x-Jake-x commented 4 years ago

Perhaps we could change the language used in the draft for color values to be more in-line with the system values (as mentioned in my example) to reflect the expected result?

jsalowey commented 4 years ago

Thanks for putting attention to this issue. This is very interesting discussion. I'm a little new to W3C and have a bit to learn.

If I followed correctly the issue with caching color profiles in incognito mode has been up-leveled to the generic url() in CSS. It's good to address this problem generically.

From the thread it sounds like there should not be a problem with leaking the details of the hardware is use.

I don't think I fully understand the impact of the specifics of the color naming yet.

svgeesus commented 4 years ago

So, returning to

the page being displayed could technically detect the actual displayed color for an element and map that color to a previous visitor.

as @tab said, the actual value is never exposed, precisely for security and privacy reasons. So no, that can't happen.

svgeesus commented 4 years ago

It sees that the url() or successor issue is the only remaining one and has been opened separately so this issue can now be closed with no change to the CSS Coor 4 specification, yes?

jsalowey commented 3 years ago

yes this addressed my comment. Thanks.

w3c / csswg-drafts

[CSS-COLOR-4] Security/Privacy: Incognito mode #5553