w3c / fingerprinting-guidance

What is browser fingerprinting and how should specification authors address it.
https://w3c.github.io/fingerprinting-guidance/
56 stars 14 forks source link

Needs to recognise a broader set of stakeholders and concerns - focus on the harm not the method #45

Closed jwrosewell closed 1 month ago

jwrosewell commented 4 years ago

MBF describes possible harms associated with browser fingerprinting but does not mention that these same harms are even greater from other existing web standards such as first-party cookies or user authentication mechanisms, much less the fingerprints embedded within telecommunication technology and the user agents software that companies rely on to “improve the quality of their service.”

Moreover, the proposal does not refer to other important W3C principles that may be negatively impacted by a partial solution that does not directly address the harms specifically mentioned. This raises a larger point that without any guiding policy on the W3C’s role in mitigating possible harms, the premise of the document is that it is the W3C’s role to be the sole identifier and mitigator of these harms. It does not mention nor acknowledge the wider group of stakeholders, nor that if bad actors shift to the other mechanisms of pseudonymous IDs (such as those stored in first-party cookies) or directly-identifiable IDs (e.g., authentication) that this does absolutely nothing to prevent bad actors from harming people. Thus, the MBF is a guidance note that at best harms smaller organizations who pose the least threat to people, while doing nothing to protect people from lager organizations.

MBF, and other similar documents, are also silent on how mitigations may pose even greater harm to people, as organizations are forced to rely on alternative methods to continue to operate their businesses. Consider that browser fingerprinting is widely used to identify fraudulent activity on the web. If fingerprinting were diminished alternative methods would be needed to identify fraud. Such alternatives might include requiring people to not only identify themselves using directly identifiable personal information, which they don’t need to provide today, but also prove they are a real person. The friction created for people might be so great that they will only be prepared to do this a few times with the most popular and therefore most dominant publishers on the web. They might only be persuaded to do this at the point of operating system activation, or when registering for an essential service such as mapping. As such there is a consequence to the guidance which will favour dominant market players, or result in an increase in fraud. An increase in fraud on the open web would likely drive marketers who fund much of the open web further towards the walled gardens of dominant market players. This is a scenario explored at length by the UK Competition and Market Authority (CMA) in their July 2020 report into digital marketing. They contemplate the introduction of pseudonymous common user Ids across the web as a remedy. We are ready to help the W3C facilitate a meeting with the CMA to better understand this important body of work.

MBF does not discuss how fraud-detection may help protect good actors from bad actors if “unsanctioned tracking” is eliminated. Is the author suggesting that we ask fraudsters for permission prior to attempting to detect their bad acts? This is of course, why this purpose is called out in GDPR as one of the examples of “legitimate interest” as an equally valid legal basis for the collection and processing of personal data as “consent".

“In contrast to other mechanisms defined by Web standards for maintaining state (e.g. cookies), browser fingerprinting allows for collection of data about user activity without clear indications that such collection is happening.” Given Google continues to set cookies on the gstatic.com, a domain widely used to provide fonts and other resources, as well set the x-client-data HTTP header value without explicit notice and control to people via IAB’s TCF (note there is no privacy policy link from gstatic.com) we wonder how Google could agree with this guidance note.

The harm thus is not the collection and processing of “otherwise pseudonymous” personal data, but instead “the correlation [of this directly] identifying information.” Why is the document focused on just one of the many pseudonymous ID mechanisms, rather than this root cause of the harm – that is helping enable people to keep their directly-identifiable information separate from ANY pseudonymous ID.

Eliminating pseudonymous IDs generated by fingerprinting does not protect people from any of the documented harms from:

• Telcos providing the “bad” state actors the user's identity • Search engine companies providing the “bad” state actors the user's identity • Browser companies that use their own application fingerprint providing the “bad” state actors the user's identity • OS providers that use their own application fingerprint providing the “bad” state actors the user's identity • Web-enabled device manufacturers use their own application fingerprint providing the “bad” state actors the user's identity • First-party publishers who condition access on a person providing directly identifiable information providing the “bad” state actors the user's identity

In short, there seems to be an underlying problem in focusing on just one of the many different ways bad actors might harm people, without any acknowledgement of the incompleteness of the harm mitigation nor the collateral damage to good uses of the same technology and impact to good actors in the ecosystem by such a myopic focus.

As a new and active member of the W3C we’re highly concerned that guidance for the creation or modification of technical standards, whether they are de facto due to premature implementations by browser vendor members or genuinely new and well thought through, is made considerably more complex by the assumption of specific and poorly aligned policy positions adopted by document authors and Groups. The Advisory Board implied the same concern in their May 2020 meeting.

dgstpierre commented 4 years ago

I agree with James. I don't believe that all angles of the problem are being considered or addressed. At DeviceForensIQ we rely heavily on fingerprinting to PREVENT fraudulent behavior on the web. It is our core product. The flip side of the techniques being proposed in an attempt to protect privacy are exactly the capabilities that assist fraudulent activity. Collecting browser characteristics to create a unique identifier is not "bad" nor does it necessarily allow for "privacy violations", it is the correlation with "actual" personal data that does so, which most of us do not or cannot do.

At DeviceForensIQ, we passively fingerprint respondents in the Markey Research industry to, among many other things, prevent taking surveys multiple times and detect fraudulent users who try to impersonate other users. The market Research industry relies heavily on honest, anonymous responses, and not having to put non-passive, intrusive identity verification techniques in place that would be burdensome and troublesome to users as well as pose a bigger privacy threat. By treating fingerprinting as bad as opposed to the association of personal data to it, you are not treating the root cause problem and enabing fraud to occur in other areas you are not considering.

npdoty commented 4 years ago

There are many privacy threats on the Web and many features with privacy impact that are distinct from browser fingerprinting: this document's scope is addressing the particular capability of recognizing browser's from observable characteristics, as that's a discussion that comes up regularly in design of features for the Web and benefits from some coordinated mitigations. Ongoing work on a Target Privacy Threat Model has a broader scope that might include some of the areas of your interest. Other privacy concerns that you note (regarding lower-level network or device OS providers or what applications or services might subsequently do with user data) may not be directly considered by that document either, but PING could help to coordinate with other standards bodies or envision new work (perhaps through Privacy CG, Web Advertising BG or some other group). In many cases, mitigations need to be made in parallel in order to be most effective: as the document currently notes, those who use onion routing to prevent pseudonymous identification through network traffic would also need mitigations at the application layer.

Concerns regarding Google's use of cookies could be directed to Google, or if browsers could help to limit privacy threats from use of cookies, that might be relevant to some IETF and W3C discussions re: cookies. That client-side controls are more feasible mitigations for cookies is one of the reasons that those tracking mechanisms may be of less concern to the TAG or others.

This Note does not consider all details related to unsanctioned tracking or contemplate if browser fingerprinting or other tracking techniques were entirely eliminated. The TAG finding on unsanctioned tracking does document why the TAG identifies the capability in widespread use as harmful for the Web; it explicitly notes that it can't be eliminated through purely technical means and describes other mitigations, including policy.

Many of the mitigations currently described in the Note are not about eliminating all browser fingerprinting surface. Indeed, making fingerprinting detectable is a more feasible mitigation and, as noted, also has the advantages of enabling mitigations from outside the standard-setting process, including by researchers and policymakers who have more discretion into the purpose and use of technical means (as you note with GDPR). Text suggestions for other mitigations to privacy harms related to browser fingerprinting would be most welcome.

npdoty commented 4 years ago

I wasn't able to attend the AB meeting, but in reading slides from Tess and Wendy and the minutes, I see the call for privacy as a feature, identifying privacy and security as a design principle, and noting that we should gradually improve privacy in the design of features rather than assuming fatalism about it.

npdoty commented 1 month ago

I think this issue doesn't have suggestions for changes to the current document, but expresses interest in other privacy/security/anti-fraud work at W3C. Closing for now.