w3ctag / design-reviews

W3C specs and API reviews
Creative Commons Zero v1.0 Universal
325 stars 55 forks source link

Scheme-bound Cookies #483

Closed mikewest closed 4 years ago

mikewest commented 4 years ago

Guten TAG!

I'm requesting a TAG review of Scheming Cookies.

Data sent over plaintext HTTP is visible to and can be manipulated by anyone on the network. The exposure of identifiers like cookies is a particular risk. Cookies' Secure attribute and the more recent __Secure- prefix mitigate this problem, but are not used nearly widely-enough to be a robust defense for users. We should shift the defaults by locking cookies to the scheme of the origin from which they're set (like every other kind of web-facing storage), and dramatically shortening the lifetime of non-secure cookies by defining a set of heuristics around a user's "session" on a site, expiring cookies along with the session.

Further details:

You should also know that this proposal dovetails with others that aim to reduce the power of non-secure sites, notably schemeful SameSite on the one hand, and requiring Secure for SameSite=None on the other.

❤️

mikewest commented 4 years ago

Note that I took some time to sketch a spec for this in sections 3.4 through 3.6 of https://mikewest.github.io/cookie-incrementalism/draft-west-cookie-incrementalism.html (also available at https://tools.ietf.org/html/draft-west-cookie-incrementalism-01 if you like pretending that HTML documents have pages).

kenchris commented 4 years ago

Can we call this something like Scheme restricted cookies? I find the name confusing

torgo commented 4 years ago

Hi @mikewest – @kenchris and I are just chatting about this on our TAG breakout and trying to pick through the explainer … we think it might be better to lead with the problem – something like: "cookies set over https might be readable over http and therefore exposed to anyone on the network" … just to make it absolutely clear what problem you are trying to solve here. Also – the name might be confusing to some (even though I see what you're doing) - see Ken's comment above.

torgo commented 4 years ago

Has there been feedback from other browsers since you filed the issue?

kenchris commented 4 years ago

How does this relate to other schemes besides http and https, especially thinking about custom schemes like those mentioned in the explainer for https://github.com/w3ctag/design-reviews/issues/482

annevk commented 4 years ago

Cookie headers are only defined for HTTP.

(I think I can say Mozilla is interested, but not sure about Sec-Nonsecure-Cookie.)

mikewest commented 4 years ago

[title] & [problem statement]

Addressed in https://github.com/mikewest/scheming-cookies/commit/bb525531232d2f4b27d624a1058922ec4f37b6ac.

How does this relate to other schemes besides http and https

This is explicitly addressed (as weird-scheme:) in https://github.com/mikewest/scheming-cookies#cookies-scope. Is there text I could add there that would help clarify?

Cookie headers are only defined for HTTP.

Cookie headers are one part of the problem, document.cookie is another. Chromium, at least, allows document.cookie to read and write to a cookie jar from non-HTTP(S) schemes, which could in theory collide with a web-facing hostname. Though we're fairly confident that there's little risk of collision today (and we create separate cookie jars under the hood for things like extensions), we'd like to harden that boundary by simply taking the scheme into account.

Has there been feedback from other browsers since you filed the issue?

(I think I can say Mozilla is interested, but not sure about Sec-Nonsecure-Cookie.)

There's also been mild engagement in https://lists.w3.org/Archives/Public/ietf-http-wg/2020JanMar/0188.html, with a little more discussion of the Sec-Nonsecure-Cookie carveout, and alternative proposals (in https://lists.w3.org/Archives/Public/ietf-http-wg/2020JanMar/0195.html, for example).

annevk commented 4 years ago

Per https://html.spec.whatwg.org/#dom-document-cookie it only ought to work for http/https/ftp and ftp is (almost) gone. Sounds like a Chrome bug, but maybe a spec bug?

mikewest commented 4 years ago

Sounds like a Chrome bug, but maybe a spec bug?

Two places I know of that rely on it in Chrome are:

I won't be surprised to find more as we try to ratchet down on document.cookie in the future.

ylafon commented 4 years ago

Regarding the deployment phase, one of the issues is sites that share cookies between HTTP and HTTPS for directing a client to the same backend in a LB. I wonder if the recent changes to deprecate TLS 1.0 and 1.1 could give us details on what would break as unattended. It is likely that such sites, if indeed not taken care of, might not work at all with the new minimum TLS settings. Otherwise, the site can indeed be updated to accommodate that change.

Also introducing a new Sec-Nonsecure-Cookie would require updates to old sites to accommodate them, redirecting everything to https would be a better move for them than handling a new cookie header.

ylafon commented 4 years ago

The TAG expresses its positive support for this (I have reservation on Sec-Nonsecure-Cookie, see above), but we are not closing this yet as we wait for implementers interest and feedback, hence the pending external feedback label.

jwrosewell commented 4 years ago

In summary further work and consultation is needed to understand the impact and justification for the proposal. At a macro level the W3C needs a more ffective method of engaging with impacted stakeholders. There is also the complex consideration of public trust associated with the providers of different digital services and the impact discrepancies have on a seemingly simple changes. The W3C should pause assessing the proposal until these matters are understood and addressed.

Justification

The proposal should have a clear justification. The explainer and IETF documents cite an EFF and Washington Post article related to Google PERF. The articles make interesting reading, but are biased and sensationalise the Snowden documents.

The stakeholder review consists of a limited series of tweets between two Google employees and another person.

This is one of a series of proposals being presented to TAG where privacy is used as a justification to trump all other considerations. Given it is not the role of the W3C to pick winners and losers on the web a balanced justification is needed with at least comparable rigour and consultation as the expected engineering specifications and debate.

Impact

The explainer contains a table showing the age of non-securely delivered cookies to support the limitation of lifetime to a session. The underlying data could be used to assess the impact of the proposal.

What is the sample size and are there any biases in the sample? How representative is the sample of the web in general? How was the sample collected?

This information should be easy to obtain by the proposer and will help the TAG understand how many websites or web sessions might break as a result of the proposal being implemented. 19,000-man years would be wasted if one billion people need to reauthenticate with their favourite websites and spent an average of 10 minutes doing so. This seems like a useful piece of information to inform the TAG assessment. No one would unwittingly wish to degrade or break even 1% of the web.

Trust

This proposal, and others seeking to remove long established features from the web, suggest Google are in some way more trustworthy than others. However consider the following scenarios.

Google have successfully entangled themselves into society and come to dominate the web via their control of Chromium and other essential services. The Google brand is exceptionally well known and easy for consumers to consent to Google privacy termsin only a small number of places (i.e. installing Android, creating a Google account, using search or accepting privacy policies). Smaller players and new entrants lack the brand presence or convenience of consent that Google almost uniquely benefit from. When a feature is removed it will hurt these smaller players far more than Google. In many cases Google will be forewarned and will have already taken steps to mitigate the impact within their own services.

When assessing this, and other proposals to remove features from the web, the W3C must balance the level of consumer trust enjoyed by Google with that which could be reasonably expected for all stakeholders including new entrants.

Consultation

The following paragraph from the explainer deserves highlighting.

“The assumption I'm making here is that developers will not react to public announcements, mailing list threads, blog posts, Lighthouse scores, devtool warnings, or etc. That's been our experience with many deprecations over the years; it is simply difficult to get a message out to a zillion developers.”

This proposal has been publicly visible on the W3C TAG design review thread for over one month. I appear to be the first person unconnected to Google or the W3C to make a comment. Very sensibly @ylafon has requested external feedback. What is being done to solicit external feedback?

Representatives from software vendors like F5, NGINX, HAProxy, Varnish and others would be well place to provide a view on the impact related to load balancing and web performance. Similarly, Content Delivery Networks (CDNs) such a Cloudflare or Akamai could be asked for comment.

The accepted norms of public debate and consultation must be followed for this and any other change which removes something from the web. The EU’s introduction of GDPR provides an effective consultation model to learn from. The web browser impacts more lives than GDPR. Robust governance is needed.

Given the acknowledged difficulties in consultation and the fact that proper consultation will take time the W3C should pause the proposal until these matters are addressed irrespective of the technical merits or otherwise of the proposal.

jwrosewell commented 4 years ago

@mikewest thank you for adding more information about the data to the explainer. Could you provide more information concerning a "broad set of users" and the sample sizes ideally as a percentage of population?

Technology savvy users in North America will have a completely different set of tools and expectations to the general population in India, Thailand, China, Myanmar, Indonesia or Zimbabwe among others. The essential services they rely on will also be operated by people and organisations with vastly different resources and skills.

annevk commented 4 years ago

@jwrosewell there's pretty broad support in the web/internet community to move to secure transports. Prior art here are TLS, HTTPS, HTTP/2 onward being restricted to TLS, secure contexts, mixed content blocking, Referer header degradation, etc. What are the concrete reasons to not align cookies with that security boundary?

mikewest commented 4 years ago

Hi, @jwrosewell! Thanks for the feedback. I'll add some comments regarding the portions of your response that seem most relevant to this proposal, but will defer to the TAG on the other questions you raise.

First, you suggested that the proposal doesn't have a clear justification. I'll try again:

This proposal suggests that something like HTTPS is an architectural prerequisite for both security and privacy, and that web security is predicated upon the origin being a defensible boundary.

These suggestions seem foundational and widely-supported in the W3C, the IETF, the IAB, and so on. Accepting them leads fairly directly to the proposal's claims that cookies should be origin-bound (just as every other kind of storage is origin-bound), and that state should be limited when traversing non-secure channels.

Second, you asked about the provenance of data regarding the age of non-securely delivered cookies. This data comes from Chrome's telemetry, covering stable-channel users who opted into sharing usage statistics for the 28-day period ending December 31st, 2019. It's a fairly broad set of users, and we find it to be a relevant and representative measure. We've since removed the histogram from Chromium, but you can see how it was previously collected by examining the historical implementation of LogCookieUMA(). I've added this context to the explainer.

I don't have a breakdown of that data by physical location, nor can I provide a concrete percentage of population that decided to share metrics with Google. The data represents a global statistic for Chrome's users, and should be interpreted as such. IMO, it provides a solid-enough basis to answer to the question "Do non-secure cookies have a long or a short lifetime in the status quo?" for the set of users that would be affected by changes to Chrome's behavior.

Related to this data, you suggest a risk of time being wasted if folks need to sign into their favorite sites again. I'd first note that this seems to apply only to the portion of the proposal that would limit the lifetime of non-secure state, and doesn't appear to be an objection to binding state to an origin by bifurcating HTTP and HTTPS cookies to begin with. That said, I agree that we should evaluate this risk, and carefully weigh it against the risks that the status quo imposes on those same folks. My hypothesis is that the substantial shift towards encryption in users' traffic patterns will make it possible to enact this change safely.

Third, you asked about external feedback and raised concerns about this being a proposal with insufficient input from the ecosystem. Given this proposal's posture as an early-stage review intended to gather directional feedback, I don't find it surprising that the community hasn't loudly weighed in. The focus at this stage is on browser vendors, and folks who are deeply engaged in the web's architecture.

That said, the TAG is not the only forum involved. I sent this proposal to the IETF's HTTP WG on the same day I raised it with the TAG, where many of the entities you mention are quite engaged. HAProxy's lead in particular gave several helpful pieces of feedback on the thread, as did folks from Mozilla. Mozilla has also given quite positive signals via their standards positions repository.

As we build an implementation developers can play with, and get closer to having something we think might be shippable, browsers generally are likely to use the channels I mentioned in the explainer (public announcements, mailing list threads, blog posts, Lighthouse scores, devtool warnings, etc) to broaden the set of the developers we can learn from.

Thanks again for your comments. Feedback is important to us, and I appreciate yours!

jwrosewell commented 4 years ago

@annevk

The purpose of the W3C as defined in the Member Agreement is to “to support the advancement of information technology in the field of networking, graphics and user interfaces by evolving the World Wide Web toward a true information infrastructure, and to encourage cooperation in the industry through the promotion and development of standard interfaces in the information environment known as the World Wide Web”.

With this in mind at least two questions must be answered before any proposal that seeks to remove “standard interfaces” and break interoperability should be considered irrespective of the technical implementation, justification or precedent.

  1. What is the minimum required standard of evidence?
  2. Does the justification and stakeholder review comply with the accepted norms?

Google also have a dominate market position which prompts a third question.

  1. Does Google’s dominate market position have a bearing on the review process and the W3C’s purpose?

Minimum standard of evidence

Google have requested the W3C TAG review a proposal that will remove a long established “standard interface” and create a new “standard interface”. The proposal threatens the long-established interoperability of the “information environment” the W3C exists to protect.

The proposal contains supporting data to indicate the “standard interface” is widely used but lacks the detail to enable the TAG to impartially assess how the proposal will positively or negatively impact the “information environment” for all. The TAG is currently unable to determine if the proposal will support the “advancement of”, or in practice degrade, the web.

What is TAG’s minimum standard of evidence required before a review is undertaken? This standard should be applied equally to all reviews seeking to change an established and widely used “standard interface” and enable an objective balancing test between benefits and impact.

Norms of governance

The proposal fails to meet the accepted norms associated with the governance of global standards.

The introduction of this proposal contains the following twitter thread as publicly available justification under the heading “Existing major pieces of multi-stakeholder review or discussion of this design:”.

Twitter

The additional information concerning stakeholder review with the IETF appears to go further but is limited to highly skilled engineers engaged in the field of technology standards and is not representative of the people who use the existing “standard interface”.

What is the minimum breadth of stakeholder review needed before TAG can themselves review a proposal that will impact 4,000,000,000 users of the web? Other governance models including telecommunications, radio standards or government regulation normally involve extensive proactive stakeholder consultation with the users and operators of the services in question over a meaningful period.

Dominant market position

According to Brendan Eich (Co-founded Mozilla & Firefox and creator of JavaScript) Google’s Chrome is the de-facto web browser. (https://twitter.com/BrendanEich/status/950209816902774785)

Google are free to unilaterally implement this proposal, or any change, within their browsers and do not require W3C approval. If they were to do so Google’s dominate market position would result in the change becoming a de-facto “standard interface”.

How does the TAG ensure they are not influenced by Google’s dominate market position? Are the indirect consequences to browser diversity or competition considered by W3C? If these factors are not openly recognised the W3C becomes involved in picking winners and losers. This is not the W3C’s purpose and it is important the W3C remain impartial.

At this early stage the proposal falls short in many areas of governance, consultation, justification and impact assessment. The W3C TAG should provide clear guidance to the proposer and all W3C members concerning the questions highlighted and pause the review process. The same broad approach should be applied to any proposal seeking to remove or alter “standard interfaces”.

jwrosewell commented 4 years ago

@mikewest thank you for providing additional informaiton which prompts three follow on quesitons.

  1. HTTPS is an architectural prerequisite – has the impact of the wide spread use of HTTPs been assessed from the perspective of carbon footprint, diversity, competition, trust, performance and the open web among others? If not, it should be before this and other proposals which rely on the acceptance of HTTPS are reviewed.

  2. Supporting data – what evidence does TAG require to assess the impact? The data provided in the explainer was presented for the purpose of establishing lifetime. Further and different data will be needed to assess impact.

  3. Stakeholder review – as the proposal progresses consultation will be required with those who use the technology in addition to browser and technology vendors. What activity is planned in this regard?

torgo commented 4 years ago

One quick comment: the TAG is on record supporting the move to https so that is not up for debate here. For the rest of the topics, we had slated this issue for a special TAG breakout to discuss further.

jwrosewell commented 4 years ago

@torgo How often do TAG positions need to be reviewed? The TAG decision to support the move to HTTPS was made five years ago. When was it last reviewed?

In the interest of transparent governance and robust decision-making challenging assumptions and group think is healthy.

As an example the EU, a leading proponent of privacy, are doing just that following the introduction of GDPR. Their 12th March 2020 meeting agenda concerning the regulation on privacy and electronic communications recognises the problems of consent fatigue in practice and is open to shifting the consent framework to the user’s agent to address these problems. They are open to evaluating the practical reality of past decisions and legislation. The W3C should do the same.

Before progressing this, or any other removal of functionality, reliant on the HTTPS security boundary or privacy argument a balancing test involving environmental impact, web performance, open web, competition, trust and the risk of a breaking change need to be undertaken explicitly considering the purpose of the W3C as accepted by all members.

Environmental

Data centre energy consumption contributes to the climate crisis. The trend to ever more security has increased carbon output. As an example; washingtonpost.com contains 160kb of custom fonts and CSS. These are fetched securely over HTTPS and HTTP/2 when available. HTTPS consumes more computing resources than HTTP. The CSS and font assets are identical for every user. The effective mandatory use of HTTPS has prevented ISPs from caching those assets and serving them in a more carbon efficient manner. These assets have no bearing on the article being read.

More research is needed into the carbon footprint associated with security. Recent research into the carbon impact of email provides some indicators as to the size of the prize. If everyone in Britain sent one less email per day 16,433 tonnes of carbon would be saved according to research conducted by OVO Energy in 2019 and commented on by Mike Berners-Lee. More robust research is needed.

Performance

If all the public content of Washingtonpost.com were publicly cacheable not only would an even greater carbon reduction result but there would also be a vast improvement to user performance. The technology to achieve this is comparatively simple, works well and is understood.

Trust

People trust brands. Brands spend vast sums of money establishing and maintaining trust.

Unlike other daily activities use of the web involves trusting three primary brands simultaneously. The web browser, the ISP and the publisher. It is hard for people to understand which of these brands are responsible for their security and the role each play. They have become confused and scared by privacy statement overload, the media’s reporting of security breaches, and find it hard to decide who to trust. Naturally they gravitate towards better known brands, and when asked statement they “want security”.

Open Web

Google introduced Accelerated Mobile Pages (AMP) in 2015 to address the performance issues impacting publishing. Some of these performance issues were introduced as a result of the mass adopting of HTTPS. As the primary host of AMP Google now control the articles users get to see and the revenues publishers receive. Google are one of the biggest financial benefactors from the security boundary referenced.

Unless something drastic changes for Firefox, Chromium based web browsers and Safari will be the only web browsers actively maintained in a few years. Google by virtue of their size and scale have absolute control over Chromium and de-facto web standards. It would be interesting to learn what plans the W3C have in place to handle this likely near future scenario.

Google will control the web browser and the publisher. Google’s walled garden and the web will in practice become one and the same. Google will enjoy unrivalled trust with web users.

Advertising funded journalism is at the heart of democracy. It is under significant threat from many quarters. Anything which threatens its rejuvenation should be questioned rigorously.

Suggestions

The W3C hosts (MIT, ERCIM, KEIO, BEIHANG) have the resources to commission the research needed into the environmental impact of ubiquitous security. Such research will enable the TAG to revalidate it’s position on HTTPS in regards to environmental impact or alter it. The global position on climate change has shifted significantly over the past five year.

The UKs competition and market authority are investigating Google. Prior to the coronavirus pandemic they were due to publish their final report and recommendations in July 2020. This report will inform the W3C.

torgo commented 4 years ago

Hi James - see https://istlsfastyet.com/ (which is referenced from our finding) for more info on TLS performance characteristics. We're not going to engage on this topic further on this thread.

jwrosewell commented 4 years ago

@torgo - thank you. What is the appropriate forum for W3C members to raise the topic?

Unrelated to the topic of performance the 76 observations to the Competition and Marketing Authority (CMA) regarding their investigation into anti trust and competition issues are now available.

They make interesting reading and will inform TAG on the indirect consequences of technical decisions.

torgo commented 4 years ago

TAG is happy with the shape of this proposal. We're still concerned about the proposed introduction of the Sec-Nonsecure-Cookie header as it's not obvious that it's the right way to solve the issue of web sites that can't be upgraded (see @ylafon's previous comment). However we are happy to close this for now.

jwrosewell commented 4 years ago

I found the IETF "The Internet is for End Users" document informative. It discusses the full range of issues associated with technology standards including those raised in the commentary here.

jwrosewell commented 3 years ago

Following inspiration from this review and the comments I commissioned some research. It seems clear that TAG are not considering issues of importance equally. Change is needed.