w3ctag / design-reviews

W3C specs and API reviews
Creative Commons Zero v1.0 Universal
332 stars 56 forks source link

Partial freezing of the User-Agent string #467

Closed yoavweiss closed 4 years ago

yoavweiss commented 4 years ago

Goedenavond TAG!

This is not your typical spec review, and is highly related to https://github.com/w3ctag/design-reviews/issues/320. But, because @torgo asked nicely, I'm opening up a review for a specific application of UA-CH as a replacement for the User-Agent string.

We've had a lot of feedback on the intent, which resulted in changes to the API we want to ship. It also resulted in many open issues. Most either have pending PRs or will have ones shortly.

The latest summary is:

Checkboxes:

Further details:

You should also know that...

[please tell us anything you think is relevant to this review]

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

jwrosewell commented 4 years ago

Significant time is now being spent by web engineers around the world "second guessing" how Google will proceed. In the interests of good governance and engineering @yoavweiss should now close this issue and the associated Intent at Chromium.org stating it will not be pursued. As many have pointed out the following is needed before it is ready to be debated by the W3C and TAG.

  1. Define and agree the objectives.
  2. Gather evidence to support the objectives - i.e. what is the impact on privacy in practice? How does this compare to other privacy weaknesses?
  3. Produce a robust design to accepted standards compatible with dependencies and their timeframes i.e. privacy sandbox.
  4. Articulate alternative designs that were considered and why they were rejected.
  5. Understand the current use of the User-Agent and first request optimisations in practice.
  6. Determine the effort impact of changing. The OpenRTB example provided by @ronancremin will consume hundreds of man years worth of engineering time across the AdTech industry alone. Any player that doesn't adopt the change will disadvantage the entire eco system. The vast majority of engineers are employed by Google's competitors who could otherwise be performing more value adding work. Google have no such constraints. The disruption benefits Google more than anyone else.

Related to this issue we have written to the Competition and Market Authority concerning Google's control over Chromium and web standards in general. A copy of that letter is available on our website.

yoavweiss commented 4 years ago

Lack of industry consultation The HTTP protocol has become deeply embedded globally over its lifetime. As envisaged by the authors of the HTTP protocol, the User-Agent string has been used in the ensuing decades for “statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations”.

Indeed. Those are all use-cases that we intend to maintain.

The User-Agent header has been part of the web since its inception. It has been stable element of the HTTP protocol through all its versions from HTTP 1.0 in 1996 all the way to HTTP/2 in 2015 and thus has inevitably come to be relied upon, even if particular use cases are not apparent, or have have been forgotten about, or its practitioners are not participants in standards groups. The User-Agent string is also likely being used in new ways not contemplated by the original authors of the specification.

There was a salutary example of the longevity of standards in a recent Tweet from the author of Envoy, a web proxy server. He has been forced to add elements of HTTP 1.0 to ensure it works in the real world, despite Envoy’s development starting 23 years after HTTP/1.1 was ratified and deliberately opting not to support HTTP 1.0. This is the reality of the web—legacy is forever.

Despite this reality, there is no public evidence of any attempt to consult with industry groups to understand the breath and severity of the impact of this proposed change to HTTP. It is a testament to its original design that the HTTP protocol has endured so well despite enormous changes in the internet landscape. Such designs should not be changed lightly.

The Client Hints infrastructure was thoroughly discussed at the IETF's HTTPWG, as well as its specific application as a replacement to the User Agent string.

Issues with the stated aim of the proposal The problem with the User-Agent string and the reason to propose Client Hints, per the explainer, is that “there's a lot of entropy wrapped up in the UA string” and that “this makes it an important part of fingerprinting schemes of all sorts.”

In subsequent discussions in the HTTP WG the privacy issues focused on passive fingerprinting, where the User-Agent string could potentially be used by entities for tracking users without their knowledge.

What is missing from the discussion is any concrete evidence of the extent or severity of this supposed tracking. Making changes to an open standard that has been in place for over 24 years should require a careful and transparent weighing of the benefits and costs of doing so, not the opinion of some individuals. In this case the benefits are unclear and the central argument is disputed by experts in the field. The costs on the other hand are significant. The burden of proof for making the case that this truly is a problem worth fixing clearly falls on the proposer of the change.

There's a lot of independent research on the subject. Panopticlick is one from the EFF.

If active tracking is the main issue that this proposal seeks to address there are far richer sources of entropy than the User-Agent string. Google themselves have published a paper on a canvas-based tracking technique that can uniquely identify 52M client types with 100% accuracy. Audio fingerprinting, time skew fingerprinting and font-list fingerprinting can be combined to give very high entropy tracking.

I'm afraid there's been some confusion. This proposal tries to address passive fingerprinting, by turning it into active fingerprinting that the browser can then keep track of.

Timeline of change This proposed change is proceeding more quickly than the industry can keep up with. In January 2020 alone there were some important changes made to the proposal (e.g. sending the mobileness hint by default). It is difficult to fully consider the proposal and understand its impact until it is stable for a while. The community needs time to 1) notice the proposal and 2) consider its impact. There has not been enough time.

Move fast and break things is not the correct approach for making changes to an open standard.

Regarding timelines, I updated the intent thread.

Narrow review group It’s difficult to be objective about this but the group discussing this proposal feels narrow and mostly comes from the web browser constituency, where the change would initially be enacted, but the impact not necessarily felt. It would be good to see more people from the following constituencies in the discussion:

  • advertisers
  • web analytics
  • HTTP servers
  • load balancers
  • CDNs
  • web caches

The latter 4 are active at the IETF and at the HTTPWG. We've also received a lot of feedback from others on the UA-CH repo.

All of these constituencies make use of the User-Agent string and must be involved in the discussion for a meaningful consensus to be reached.

Obviously you can’t force people to people contribute but my sense is that this proposal is not widely known about amongst these impacted parties.

Diversity of web monitisation Ads are the micropayments system of the web. Nobody likes them but they serve a crucial role in the web ecosystem.

The proposed change hurts web diversity by disproportionally harming smaller advertising networks that use the OpenRTB protocol. This essentially means most networks outside of Google and Facebook. Why? The User-Agent string is part of the OpenRTB BidRequest object where it is used to help inform bidding decisions, format ads and targeting.

A few points:

Why does it hurt Google less? Because Google is able to maintain a richer set of user data across its dominant web properties (90% market share in search), Chrome browser (69% market share) and Android operating system (74% market share).

The web needs diversity of monetisation just as much as it needs diversity in browsers.

Dismissive tone in discussions Some of the commentary from the proposers has been dismissive in nature e.g. the following comments on the Intent to Deprecate and Freeze: The User-Agent string post in response to a set of questions:

  • “I’d expect analytics tools to adapt to this change.”
  • “CDNs will have to adapt as well.“

Entire constituencies of the web should not be dismissed out of hand. This tone has no place in standards setting.

I apologize if this came across as dismissive. That wasn't my intention.

Entangling Chrome releases with an open standards process In the review request, Chrome release dates are mentioned. It doesn’t feel appropriate to link a commercial organisation’s internal dates to a proposed standard. There are mentions of shipping code and the Chrome intent.

The TAG review process asks for relevant time constraints. I provided them.

Overstated support This point has been made by others here but it is worth restating. It feels like there is an attempt to make this proposal sound as if it has broader support than it really does, in particular on the Chrome intent, linked explicitly by the requester.

Unresolved issues The review states “Major unresolved issues with or opposition to this specification: “ i.e. no unresolved issues or opposition. This is true only if you consider unilaterally closed issues to be truly closed. Here are a couple of issues that were closed rather abruptly, and coinciding with a Chrome intent.

Some closed HTTPWG issues:

I'm not sure what your point is here. These issues were raised (one by me), discussed, resolved and then closed.

ronancremin commented 4 years ago

Thanks for the response.

Despite this reality, there is no public evidence of any attempt to consult with industry groups to understand the breath and severity of the impact of this proposed change to HTTP. It is a testament to its original design that the HTTP protocol has endured so well despite enormous changes in the internet landscape. Such designs should not be changed lightly.

The Client Hints infrastructure was thoroughly discussed at the IETF's HTTPWG, as well as its specific application as a replacement to the User Agent string.

I'm saying that there is insufficient industry realisation that this is going on, despite discussions in the HTTPWG. However well-intentioned the discussions are it seems that some web constituents are only vaguely aware of what's being proposed. Obviously this isn't any particular person's fault but it feels like more time or outreach is required for the industry to become aware of the proposal and respond.

What is missing from the discussion is any concrete evidence of the extent or severity of this supposed tracking. Making changes to an open standard that has been in place for over 24 years should require a careful and transparent weighing of the benefits and costs of doing so, not the opinion of some individuals. In this case the benefits are unclear and the central argument is disputed by experts in the field. The costs on the other hand are significant. The burden of proof for making the case that this truly is a problem worth fixing clearly falls on the proposer of the change.

There's a lot of independent research on the subject. Panopticlick is one from the EFF.

With respect, I don't think this answers my concern at all, specifically the extent of this supposed passive tracking. Panopticlick and others like it say what's possible without saying anything about how widespread this tracking actually is, so I don't think that this counts as evidence of passive tracking. Furthermore, Panopticlick mixes both passive and active tracking. If there is independent research on the extent and severity of passive tracking maybe you could cite it here?

Existence of data in the OpenRTB BidRequest object doesn't mean that users and their user agents are obligated to provide it to advertisers. For example, I also see Geolocation in that same object as "recommended". I'm assuming you don't think that browsers should passively provide geolocation data on every request.

No, of course not. And User agents are not obligated to send User-Agent headers either and can say whatever they want in them.

But the point is that most user agents have been sending useful User-Agent headers for the last 25 years or so and, for all its imperfection, the web ecosystem has grown up around this consensus, including the advertising industry that helps pay for so much of what we utilise on the web.

The user agent information would still be available to advertisers, they'd just have to actively ask for it (using UA-CH or the equivalent JS API) in ways that enable browsers to know which origins are gathering that data.

Yes, but they get it only on the second request—a significant drawback in an industry where time is everything, especially on mobile devices where connectivity issues are more likely.

The review states “Major unresolved issues with or opposition to this specification: “ i.e. no unresolved issues or opposition. This is true only if you consider unilaterally closed issues to be truly closed. Here are a couple of issues that were closed rather abruptly, and coinciding with a Chrome intent.

I'm not sure what your point is here. These issues were raised (one by me), discussed, resolved and then closed.

Perhaps this is the normal process but the closure felt forced/abrupt.

jwrosewell commented 4 years ago

There is a still a general lack of awareness around this and related Google driven changes to the web.

Note: I've edited this post to ensure compliance with the W3C code of conduct. I hope it has - and will - provoked thought.

The Economist last week touched on these subjects in a Special Report. In accessing it online one can experience the effects this change is already having on journalisim and access to advertising funded content. Basically you'll have to enter your email address everytime you want to read something, or if that's too much hassle just use Google News where Google will control the revenue for everyone. Small publishers will never get funding and will be commercially unviable. That's not a good thing and takes us a long way from Sir Tim's vision for the web.

Here's a link to The Economist article.

Who will benefit most from the data economy?

And my wider thoughts on the impact.

My wider thoughts on the changes and the impact to AdTech

jwrosewell commented 4 years ago

Last week the European Parliament and Council met to debate repealing legislation concerning Privacy and Electronic Communications regulation. The proposals recognise legitimate interest and the providers of electronic services. It calls out the end user confusion associated with gaining and controlling consent. These are subjects me and others have articulated previously in comments on this proposal.

The debate explicitly recognises the use of “metadata can be useful for businesses, consumers and society as a whole”. Legitimate interest includes:

• identification of security threads; • meeting quality of service requirements; • aggregated analysis; • providing services; • websites without direct monetary payment; • websites wholly or mainly financed by advertising; • audience measuring; • management or optimisation of the network; • detecting technical faults; • preventing phishing attacks; and • anti-spam.

The debate recognises “providers should be permitted to process an end-user’s electronic communications metadata where it is necessary for the provision of an electronic communications service”.

Implementations should be performed in the “least intrusive manner”. The User-Agent meets this criteria.

There is an explicit list of the information contained within the end users’ terminal equipment that requires explicit consent. The list does not include metadata such as that contained in the User-Agent.

The legitimate interests of businesses are explicitly recognised “taking into consideration the reasonable expectations of the end-user based on her or his relationship with the provider”.

The debate advocates placing controls over consent and control within the terminal equipment (or user’s agent) not the removal of such data.

The outcome of the debate should inform the W3C, Chromium and other stakeholders. The UK (now no longer part of the EU) is also considering these matters via the Competition and Markets Authority (CMA) investigation and the Information Commissioners Office (ICO). At least two of these three regulatory bodies are publicly progressing in a direction that is not aligned to this proposal.

It is not the business of the W3C to help pick winners and losers on the web. This proposal in practice will favour larger businesses. Technically and now regulatorily it looks like a solution looking for a problem. It should be rejected by the W3C at this time.

The full text of the EU document is available here.

https://data.consilium.europa.eu/doc/document/ST-5979-2020-INIT/en/pdf

jwrosewell commented 4 years ago

Google yesterday recognise the stresses many businesses are already under and are doing their bit to reduce that burden by delaying enhancements to Chromium and Chrome. Here's the short update.

"Due to adjusted work schedules at this time, we are pausing upcoming Chrome and Chrome OS releases. Our primary objectives are to ensure Chrome continues to be stable, secure, and work reliably for anyone who depends on them. We’ll continue to prioritize any updates related to security, which will be included in Chrome 80."

https://blog.chromium.org/2020/03

jwrosewell commented 4 years ago

@torgo for those that are following this issue please could you add a comment / link on the output from TAG review?

There were may related comments concerning the issues associated with removing a standard interface discussed under the Scheme-bound Cookies proposal.

Thank you.

torgo commented 4 years ago

We see from this post from Yoav that the proposal has been put off until 2021 at the earliest. At the same time, Client Hints is progressing. We think this state of affairs could allow client hints to mature, both in terms of the spec, and in terms of the implementations and industry adoption. Right now we're going to close this issue to make way for other issues in our workload but we'll be happy to re-open when appropriate.

jwrosewell commented 4 years ago

FYI A pull request has been made on the WICG draft specification to add feedback from those who have used the experiments and considered the specification.

ronancremin commented 3 years ago

We see from this post from Yoav that the proposal has been put off until 2021 at the earliest. At the same time, Client Hints is progressing. We think this state of affairs could allow client hints to mature, both in terms of the spec, and in terms of the implementations and industry adoption. Right now we're going to close this issue to make way for other issues in our workload but we'll be happy to re-open when appropriate.

Should this now be reopened in light of the updated roadmap from the Chromium team?

torgo commented 3 years ago

@ronancremin looks like we will be reviewing in #640 rather than re-opening this issue.

ronancremin commented 3 years ago

Ah, I had missed #640. That works.